Tribune DataViz

Matters of interest, from the data reporters and developers across Tribune Publishing

We’re hiring a senior Python developer!

leave a comment »

Do you consider yourself a fan of Python? Are you passionate about reporting in Chicago? Have you ever had long discussions about documentation? Then you might be right for TribGraphics. Check out the job posting here and apply soon!

We want our newsroom to look more like the city that we serve. If you’re a person from a group that’s underrepresented in engineering or journalism, we really want to see your application!

Advertisements

Written by nausheenhusain

March 30, 2017 at 11:38 am

Posted in Jobs, Python, Uncategorized

We’re hiring developers/senior developers!

with 2 comments

Come join our team! You will work closely with other developers, reporters, photographers, designers and editors to create great online journalism and build tools that help the newsroom. This could include wrangling and visualizing data, imagining new ways to tell a story online, creating scrapers and tapping into 165 years of archived stories.

Senior developer:
We are looking for an experienced engineer who wants to build websites around the news, provide guidance to other developers and uphold standards. The developer in this role should be someone with experience writing high-performance code, determining technological requirements necessary to implement new projects and maintaining infrastructure. You don’t need to be a designer, but you should have good visual sensibilities.

Your qualifications:
In-depth knowledge of the following tools and concepts:

  • Amazon Web Services (EC2, S3, RDS, Route 53, Cloud Search)
  • Linux system administration (Ubuntu experience preferred)
  • Load balancers and load testing
  • Databases, including PostgreSQL + PostGIS
  • Data structure servers (we use Redis)
  • Asynchronous messaging/task queues (we use Celery)
  • Virtualization (we use Vagrant + Virtualbox)
  • Unit testing
  • Version control (we use git)
  • Python (including Django and Flask)
  • Javascript
  • Popular Javascript libraries: jQuery, Underscore, Backbone, D3
  • HTML/CSS

In addition, you should be comfortable with:

  • Application deployment and monitoring (many of our applications use Fabric)
  • Open-sourcing your work and working with open-source projects
  • Brainstorming potential solutions with your teammates
  • Working on an agile team and/or scrum team; previous agile/scrum experience is not required, but you need to be adaptable

Developer:
We’re also looking for a developer to build interactive news projects and help newsroom staffers research and present their work. We are looking for someone who is comfortable working with and visualizing data and imagining new ways to present information. The developer in this role should be versatile and eager to work on a variety of projects, from creating a database with a graphical front-end to building tools for newsroom staffers.

Your qualifications:
In-depth knowledge of the following tools and concepts:

  • Version control (we use git)
  • Javascript
  • Popular Javascript libraries: jQuery, Underscore, Backbone, D3
  • Python (including Django and Flask)
  • Mapping tools such as Leaflet and QGIS
  • HTML/CSS

In addition, you should be comfortable with:

  • Working in a team environment
  • Open-sourcing your work and working with open-source projects
  • Databases, including PostgreSQL + PostGIS
  • Amazon Web Services (EC2, S3, RDS, Route 53, Cloud Search)
  • Working on an agile team and/or scrum team; previous agile/scrum experience is not required, but you need to be adaptable

If you’re interested in the senior developer role, please apply here; if you think you’re a better fit for the developer role, apply here.

Written by Kaitlen Exum

October 9, 2014 at 4:21 pm

Posted in Uncategorized

Tagged with

My Terrible Code, Or How To Archive A News Site

leave a comment »

Screen Shot 2014-09-17 at 9.36.52 AMNewspapers have long kept archives of their past editions. Go into any major newspaper basement and you’ll probably find microfilm of past papers, film negatives of photographs spanning a century, maybe even copies of every print edition.

But how often do we think about archiving our websites?

With the recent launch of the redesigned chicagotribune.com, a few of the News Apps team’s projects needed to move from our own custom solutions to our main content management system. This meant old links on our projects, which lived on those subdomains, could break and cause lots of SEO problems.

Because the Chicago Tribune values its content, regardless of where it lives — for proof, see our archives project — we needed to find a way to store this content and make it accessible to our readers via the old links.

Now, the previous architecture worked like this: Content items (stories, photo galleries, videos) came in through our CMS, we accessed those stories via its API and then we styled and presented those content items on our different platforms, powered usually by Flask or Django.

For Blue Sky, the Chicago Tribune innovation news vertical that originally lived at bluesky.chicagotribune.com, we had a basic Django app that would generate the templates for the pages, events listings and section fronts. Because old content needed to stay put — the links needed to stay the same — and all new stories would start afresh with the launch of our redesign, this presented a few problems.

First, we couldn’t just continue to keep our Django app running. It cost money and was one more thing we needed to maintain. Not to mention, we were changing some fields in the API — turning the content items to no longer be live so they wouldn’t appear on the redesigned chicagotribune.com — so the Django app was no longer a solution.

Second, we needed to do a basic redesign of the older pages to point people to the new site, which would now live at chicagotribune.com/bluesky. So we needed a way to not affect the content of these pages, but make some minor changes to the look and feel and, even more importantly, let readers know how to reach Blue Sky’s new home.

After some discussion — and because we had to do the same thing for a similar project, our Fuel-Efficient Cars Guide vertical — we decided we would modify the app locally, create some script that turns all the pages into HTML and upload them to an S3 bucket that would live at the old subdomain, bluesky.chicagotribune.com.

Next I edited the templating code, making it less complicated and pointing it to our live site. I thought about trying to redesign it to look like our new site, but then I remembered: When you’re archiving your newspapers from the 1980s, you don’t redesign each one to look like it’s 2014, do you? Nope. Same thinking goes when archiving old web content.

What follows below is an ugly Django management command. The important thing is, it works. I definitely could have written it to just use Django’s templating to create the HTML files. I couldn’t easily figure out from our CMS API what the exact URL was for each content item when it was built in Blue Sky. I didn’t know if something lived at bluesky.chicagotribune.com or bluesky.chicagotribune.com/originals or bluesky.chicagotribune.com/hub.

So I wrote my code to try to check each spot on my local machine. If it found something there, it would create the HTML and save it as a file to the right directory. Like I said, ugly. But it works, which is the most important thing in getting our work done:

from datetime import datetime

from django.core.management.base import BaseCommand
from django.conf import settings
from django.test.client import RequestFactory

from p2psections import p2p
from p2p import P2PNotFound, P2PException

from optparse import make_option

import logging
import urllib2
import requests
import time
log = logging.getLogger(__name__)
APP_SOURCE_CODES = ["bluesky"]
# The offsets allow us to run through as many of each content type as is in the system
# And yeah, this is totally an ugly and gross way to do this. But it works!
OFFSETS = [0, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900, 2000]
CONTENT_ITEM_TYPES = ["story", "storylink", "video", "premiumvideo", "photogallery", "storygallery"]
ERROR_SLUGS = []

class Command(BaseCommand):
    option_list = BaseCommand.option_list + (
        make_option(
            '--pretend',
            action='store_true',
            dest='pretend',
            default=False,
            help="Don't upload anything"),
    )

    def handle(self, *args, **options):
        # This allows you to pick a specific content item type from our CMS
        content_type = raw_input("Which content item type do you want to use? Pick one. Options are: story, storylink, video, premiumvideo, photogallery, storygallery ")
        start_time = datetime.now()
        slugs = []
        for offset in OFFSETS:
            results = p2p.search({
                'conditions': {
                    'source': APP_SOURCE_CODES,
                    'content_item_type': content_type,
                    #'content_item_state_code': 'live',
                },
                'filter': p2p.default_filter,
                #'order': {'last_modified_time': 'descending'},
                'limit': 100,
                'offset': offset,
                'content_item_state_code': 'live',
            })
            print offset
            # Here we make a big list of each of the slugs for our CMS, so we
            # can later iterate through them and create the HTML
            for item in results['content_items']:
                slugs.append(item['slug'])
        # Create a template with each slug
        for slug in slugs:
            self.url_finder(slug, content_type)
            time.sleep(1)
        finish_time = datetime.now()
        total_time = finish_time - start_time
        print "All done, took %s seconds to complete." % total_time.seconds
        # I would later run the error slugs through individually to try and
        # catch any I didn't originally get
        print "These had errors: %s" % ERROR_SLUGS

    def url_finder(self, slug, content_item):
        print slug, content_item
        response = requests.get('http://localhost:8000/%s,0,0.%s' % (slug, content_item))
        try:
            if response.status_code == 404:
                print "Failed at /", response.status_code
                # Check if it is at /originals/slug
                response = requests.get('http://localhost:8000/originals/%s,0,0.%s' % (slug, content_item))
                if response.status_code == 404:
                    print "Failed at /originals", response.status_code
                    # Check if it is at /hub/slug
                    response = requests.get('http://localhost:8000/hub/%s,0,0.%s' % (slug, content_item))
                    if response.status_code == 404:
                        # It couldn't be found!
                        print "%s couldn't be found" % slug
                        ERROR_SLUGS.append(slug)
                    else:
                        print "Attempting to save at /hub"
                        try:
                            html = urllib2.urlopen('http://localhost:8000/hub/%s,0,0.%s' % (slug, content_item)).read()
                            print "got html"
                            text_file = open(("archive/hub/%s,0,0.%s" % (slug, content_item)), "w")
                            text_file.write(html)
                            text_file.close()
                            print "Saved %s at /hub" % slug
                        except Exception, e:
                            print e
                            ERROR_SLUGS.append(slug)
                else:
                    print "Attempting to save at /originals"
                    try:
                        html = urllib2.urlopen('http://localhost:8000/originals/%s,0,0.%s' % (slug, content_item)).read()
                        print "got html"
                        text_file = open(("archive/originals/%s,0,0.%s" % (slug, content_item)), "w")
                        text_file.write(html)
                        text_file.close()
                        print "Saved %s at /originals" % slug
                    except Exception, e:
                        print e
                        ERROR_SLUGS.append(slug)
            else:
                print "Attempting to save at /"
                try:
                    html = urllib2.urlopen('http://localhost:8000/%s,0,0.%s' % (slug, content_item)).read()
                    print "got html"
                    text_file = open(("archive/%s,0,0.%s" % (slug, content_item)), "w")
                    text_file.write(html)
                    text_file.close()
                    print "Saved %s at /" % slug
                except Exception, e:
                    print e
                    ERROR_SLUGS.append(slug)

        except urllib2.HTTPError:
            ERROR_SLUGS.append(slug)

When you run the command, it asks you for which content item type to loop through. Then when it’s done, it gives you a list of slugs it couldn’t find on your local machine. I used that to run through it again later, just with those specific slugs, because sometimes errors could occur via the API on my local machine. If I couldn’t find it after a few more runs, that meant it wouldn’t live on our live site.

After that, I created an Amazon S3 bucket — bluesky.chicagotribune.com — and started uploading the more than 2,700 files to their respective folders, making sure to set their Content-Type to “text/html,” as they were files that looked like “chi-nordstrom-buys-trunk-club-bsi-news,0,0.htmlstory” and S3 didn’t automatically think to load them as HTML.

Then when our site redesign launched, we had the DNS for bluesky.chicagotribune.com point to the S3 bucket. The index file of that bucket redirects to chicagotribune.com/bluesky. And that’s how you archive 2,700 pieces of content without breaking any links.

Written by Andy Boyle

September 17, 2014 at 9:31 am

Posted in Uncategorized

I suck at this

leave a comment »

I’ve wanted to program computers ever since I was 14 or 15, and now that I’m 31, I can honestly say I have always known that I suck at programming computers. I’ve come to terms with this knowledge. It doesn’t bother me like it used to. I know I’m bad at my job, but I also know that it doesn’t matter what I think — it matters what I do. [Ed. note:  Abe is awesome at his job. He just doesn’t believe it.]

I take on tasks, close tickets, and steadily get better at not sucking quite so hard. But since this feeling of inadequacy seems to be common among programmers, let’s take a tour through my insecurities! I’m not sure why you should come with me, though, it probably won’t be that good.

May 15, 2000 — This date is made up, but it takes us to junior year in high school (go Dolphins!) One of my best friends, Scott, is a year ahead of me in programming classes – he’s taking AP Computer Science, and I’m just taking a programming class. He shows me the code he’s working on, mostly hard-core graphics for video games, mostly in C++. I don’t show him the code I’m working on, mostly crappy websites (the Wayback Machine preserved my Geocities Starcraft fanfiction abomination!)

At this point, I’ve been a fan of programming for a few years, but it hasn’t really clicked for me yet. Everything I write is basically copy-pasted from an O’Reilly book. (Sidenote: remember books?)

May 15, 2005 — This date is also made up, but it roughly corresponds to the day of my one truly rock-star college programming moment: the day I took (and passed!) the final exam for CS 440, Topics in Artificial Intelligence.

Getting a B in that class was one of the hardest things I’ve ever done. I barely understood what was going on; I’d leave lectures in a daze after 45 minutes of a professor saying words that I didn’t comprehend. I’d turn in homework assignments that at best failed in interesting ways. I got celebratorily drunk the night of the final, drunker than I’ve ever been before or since (Note to college kids: Tequila and Red Bull tastes delicious but it is such a bad idea, oh my God).

May 15, 2010 — Let’s just keep doing May 15ths every five years. Now we’re in California, and I’ve moved from a non-technical job answering emails from news publishers to a slightly-technical one, writing Python scripts that don’t work very well to interact with a poorly-maintained database that I am largely in charge of.

I’m not on a technical team, but I’m surrounded by extremely accomplished developers on related teams, and every time I sit in a meeting with them I barely say a word, for fear that they’ll realize I don’t know what I’m talking about and should probably be fired. Every time I have to make a change to the database, I spend the day with my stomach in knots, just knowing that I’ll screw it up somehow and everyone will find out. I dread every time I submit my code for review, certain that each time will be the one they discover that I’m a fraud and need to be eased out of the company.

May 15, 2014 — We’re more or less at the present, and for the first time in my life, I feel comfortable and confident showing up every day to write code. I’ve been at the Tribune for a little over a year; for most of that time, every Monday I would wake up nervous, worried that when I got in and checked my email, disaster would be waiting in the form of angry emails alerting me to an embarrassing bug (or worse.)

Every Monday this would happen, and at my previous job as well. But now it’s stopped happening. I’m not quite sure why, but I think it’s because I’ve realized that insecurities are not the same thing as reality. I know I’m not a very good programmer, but as I get better I realize that things that used to seem immeasurably more complex than anything I could understand are not so daunting.

The code I write is still bad, but it’s markedly less bad than it used to be. And that’s all I can do: to try hard, to steadily get better and to accept that I am going to make a lot more mistakes, some of them embarrassing, some of them catastrophic. There’s no other way to improve. There’s no other way to do what we do.

Watching the World Cup, I was reminded that this is the mentality that athletes must have — the cameras are on, the entire planet is watching, and a defender makes a misstep that lets a striker get by him for a goal. His entire home country thinks he’s terrible now. But he can’t let that bother him.

The next time that striker comes down the field, the defender has to challenge him, confident in the knowledge that this time, the striker’s going to look like an idiot. That’s all he can do. That’s all any of us can do.

Written by Abe Epton

August 20, 2014 at 3:08 pm

Posted in Uncategorized

We’re hiring: Team Leader Extraordinaire

leave a comment »

We’re looking for a new director—a creative thinker with solid management experience and a passion for digital journalism. In this role, you’ll manage a team of agile developers and partner with editors across Tribune Publishing (our publications include the Chicago Tribune, Baltimore Sun, Orlando Sentinel and many others).

You will work with the team in Chicago and collaborators in newsrooms across the country to identify needs and opportunities, provide technical guidance and recommendations, shepherd special projects and facilitate clear communication between stakeholders.

You will problem-solve. You will prioritize. You will insist on transparency in our processes. You will ruthlessly simplify to provide our audience with useful, compelling content. You will show your work. You will work with brilliant, wonderful developers, reporters, photographers, designers and editors.

Your qualifications:

  • Experience in digital media, data journalism or a related field
  • Experience in digital product development
  • Computer science and/or journalism background preferred (you don’t need to be a developer, but you need to work well with developers)
  • Project management, negotiation and analytical skills
  • A deep understanding and commitment to open source
  • Passion for the news
  • Curiosity and enthusiasm about exploring new storytelling formats

Written by Kaitlen Exum

June 26, 2014 at 3:52 pm

Posted in Jobs

Getting started has never been easier

leave a comment »

Having crossed the print-to-digital divide, I get a lot of questions from others who are trying to make the journey. While the conversations are often unique, they all share a familiar opening query:

Where the hell do I begin?

If you are just getting into web development, there are two absolute truths that set the tone for everything else that will follow:

1) The work you produce needs to live online.

Too many beginners get stuck in Codecademy-like, browser-based exercises that simply can’t live beyond the container they begin in. While those exercises are useful, they teach you nothing about actually getting your work online. To stay the course and grow, you need to get something online as soon as possible.

2) We tend to overcomplicate how we get our work online.

As journalists, we’re producing stories and charts — we’re not curing cancer. These are really just web pages, so let’s find the simplest way to get the job done.

For those paying attention, that basically means that your work needs to be online somewhere and that doing so doesn’t have to be hard. In fact, there is no reason why you can’t get something online today.

Getting your bare-bones web page online

To participate, you’ll need Dropbox on your machine (free), a text editor (I suggest Sublime Text, also free) and this HTML file that I’ve already gotten started for you.

Once your Dropbox account is set up, place the HTML file provided into your Dropbox/Public folder.

Now that the file is in the Public folder, you should be able to right-click on the file and a ‘Copy Public Link’ option will appear. With that link copied, I want you to open a new browser window and paste that link into the browser address. You should now see the trib_index.html file that was provided.

Finally, using your text editor, open the trib_index.html file from within Dropbox.

Following these steps will create a live editing environment where you can see the edits you are making on the trib_index.html file appear instantly online via the public link you created from inside of Dropbox.

Immediately, you’ll probably notice that your page is styled. That’s because the trib_index.html page we provided has a CDN link to Bootstrap. This link gives you access to anything in the Bootstrap CSS and JS framework, which means that this is excellent time to acquaint yourself with the Bootstrap documentation.

So start playing around already!

Change headlines, figure out what a Jumbotron class is, send yourself the link and open it on your phone and see how the page responds to a different device. You have a real, live page on the internet that you can now use as a sandbox to experiment and stretch your skills.

Stop the presses

This whole process is probably setting off alarms for seasoned web developers. You’re likely saying to yourself that this isn’t web development.

But isn’t web development simply getting your project online and seeing how to it functions in the wild? When viewed that way, this sounds like the basics of Browser Testing 101 — which is web development.

The only thing this isn’t is complicated.

Most beginners need a way to ease into web development. There are more complicated routes but how many people give up before they ever get to the point where they see their work online?

Once people understand how to get their projects online, they begin to build the resiliency needed to overcome the headaches that will arise as their ambitions lead them into more complicated concepts.

So let’s focus on getting more beginners ‘online’ and letting natural curiosity define the next steps of the collective journey.

Written by Chris Courtney

June 11, 2014 at 5:34 pm

Posted in CSS, First Steps, HTML

Announcing Tarbell 0.9 beta 6

leave a comment »

Today we released Tarbell 0.9 Beta 6. (Tarbell is our open-source static site generator based on Google spreadsheets, made with newsrooms in mind. Read more here!) This is our biggest release to date and should be our last release before a stable 1.0 version to come in the next week or two. Here are some notable changes:

  • New naming conventions: The “base template” naming was confusing to users. We have switched to the term “Tarbell blueprints” to better reflect the role and function of this key Tarbell concept. The “_base” directory has been renamed “_blueprint” and the documentation now refers to “Tarbell blueprints” instead of “base templates.” Projects created with previous versions of Tarbell will still work.
  • Expanded documentation: We greatly expanded and improved the Tarbell documentation, including a more in-depth tutorial.
  • New hook system: Developers can now trigger actions during project installation. Create a repository and tickets when creating a new project or refresh the Facebook cache when publishing.
  • Improved command line interface: Better wording, formatting, and line-wrapping.
  • Better credentials: Tarbell now supports publishing from non-interactive environments.
  • Support for project requirements: Tarbell projects and blueprints can now specify 3rd party Python libraries as dependencies.

Get started by installing Tarbell! Already a Tarbell user? Upgrade with:

pip install -U tarbell

Special thanks goes to Heather Billings, who did tremendous work on this release.

 

 

Written by David Eads

June 6, 2014 at 11:24 am