Tribune DataViz

Matters of interest, from the data reporters and developers across Tribune Publishing

Archive for the ‘Uncategorized’ Category

We’re hiring a senior Python developer!

leave a comment »

Do you consider yourself a fan of Python? Are you passionate about reporting in Chicago? Have you ever had long discussions about documentation? Then you might be right for TribGraphics. Check out the job posting here and apply soon!

We want our newsroom to look more like the city that we serve. If you’re a person from a group that’s underrepresented in engineering or journalism, we really want to see your application!

Written by nausheenhusain

March 30, 2017 at 11:38 am

Posted in Jobs, Python, Uncategorized

We’re hiring developers/senior developers!

with 2 comments

Come join our team! You will work closely with other developers, reporters, photographers, designers and editors to create great online journalism and build tools that help the newsroom. This could include wrangling and visualizing data, imagining new ways to tell a story online, creating scrapers and tapping into 165 years of archived stories.

Senior developer:
We are looking for an experienced engineer who wants to build websites around the news, provide guidance to other developers and uphold standards. The developer in this role should be someone with experience writing high-performance code, determining technological requirements necessary to implement new projects and maintaining infrastructure. You don’t need to be a designer, but you should have good visual sensibilities.

Your qualifications:
In-depth knowledge of the following tools and concepts:

  • Amazon Web Services (EC2, S3, RDS, Route 53, Cloud Search)
  • Linux system administration (Ubuntu experience preferred)
  • Load balancers and load testing
  • Databases, including PostgreSQL + PostGIS
  • Data structure servers (we use Redis)
  • Asynchronous messaging/task queues (we use Celery)
  • Virtualization (we use Vagrant + Virtualbox)
  • Unit testing
  • Version control (we use git)
  • Python (including Django and Flask)
  • Javascript
  • Popular Javascript libraries: jQuery, Underscore, Backbone, D3

In addition, you should be comfortable with:

  • Application deployment and monitoring (many of our applications use Fabric)
  • Open-sourcing your work and working with open-source projects
  • Brainstorming potential solutions with your teammates
  • Working on an agile team and/or scrum team; previous agile/scrum experience is not required, but you need to be adaptable

We’re also looking for a developer to build interactive news projects and help newsroom staffers research and present their work. We are looking for someone who is comfortable working with and visualizing data and imagining new ways to present information. The developer in this role should be versatile and eager to work on a variety of projects, from creating a database with a graphical front-end to building tools for newsroom staffers.

Your qualifications:
In-depth knowledge of the following tools and concepts:

  • Version control (we use git)
  • Javascript
  • Popular Javascript libraries: jQuery, Underscore, Backbone, D3
  • Python (including Django and Flask)
  • Mapping tools such as Leaflet and QGIS

In addition, you should be comfortable with:

  • Working in a team environment
  • Open-sourcing your work and working with open-source projects
  • Databases, including PostgreSQL + PostGIS
  • Amazon Web Services (EC2, S3, RDS, Route 53, Cloud Search)
  • Working on an agile team and/or scrum team; previous agile/scrum experience is not required, but you need to be adaptable

If you’re interested in the senior developer role, please apply here; if you think you’re a better fit for the developer role, apply here.

Written by Kaitlen Exum

October 9, 2014 at 4:21 pm

Posted in Uncategorized

Tagged with

My Terrible Code, Or How To Archive A News Site

leave a comment »

Screen Shot 2014-09-17 at 9.36.52 AMNewspapers have long kept archives of their past editions. Go into any major newspaper basement and you’ll probably find microfilm of past papers, film negatives of photographs spanning a century, maybe even copies of every print edition.

But how often do we think about archiving our websites?

With the recent launch of the redesigned, a few of the News Apps team’s projects needed to move from our own custom solutions to our main content management system. This meant old links on our projects, which lived on those subdomains, could break and cause lots of SEO problems.

Because the Chicago Tribune values its content, regardless of where it lives — for proof, see our archives project — we needed to find a way to store this content and make it accessible to our readers via the old links.

Now, the previous architecture worked like this: Content items (stories, photo galleries, videos) came in through our CMS, we accessed those stories via its API and then we styled and presented those content items on our different platforms, powered usually by Flask or Django.

For Blue Sky, the Chicago Tribune innovation news vertical that originally lived at, we had a basic Django app that would generate the templates for the pages, events listings and section fronts. Because old content needed to stay put — the links needed to stay the same — and all new stories would start afresh with the launch of our redesign, this presented a few problems.

First, we couldn’t just continue to keep our Django app running. It cost money and was one more thing we needed to maintain. Not to mention, we were changing some fields in the API — turning the content items to no longer be live so they wouldn’t appear on the redesigned — so the Django app was no longer a solution.

Second, we needed to do a basic redesign of the older pages to point people to the new site, which would now live at So we needed a way to not affect the content of these pages, but make some minor changes to the look and feel and, even more importantly, let readers know how to reach Blue Sky’s new home.

After some discussion — and because we had to do the same thing for a similar project, our Fuel-Efficient Cars Guide vertical — we decided we would modify the app locally, create some script that turns all the pages into HTML and upload them to an S3 bucket that would live at the old subdomain,

Next I edited the templating code, making it less complicated and pointing it to our live site. I thought about trying to redesign it to look like our new site, but then I remembered: When you’re archiving your newspapers from the 1980s, you don’t redesign each one to look like it’s 2014, do you? Nope. Same thinking goes when archiving old web content.

What follows below is an ugly Django management command. The important thing is, it works. I definitely could have written it to just use Django’s templating to create the HTML files. I couldn’t easily figure out from our CMS API what the exact URL was for each content item when it was built in Blue Sky. I didn’t know if something lived at or or

So I wrote my code to try to check each spot on my local machine. If it found something there, it would create the HTML and save it as a file to the right directory. Like I said, ugly. But it works, which is the most important thing in getting our work done:

from datetime import datetime

from import BaseCommand
from django.conf import settings
from django.test.client import RequestFactory

from p2psections import p2p
from p2p import P2PNotFound, P2PException

from optparse import make_option

import logging
import urllib2
import requests
import time
log = logging.getLogger(__name__)
APP_SOURCE_CODES = ["bluesky"]
# The offsets allow us to run through as many of each content type as is in the system
# And yeah, this is totally an ugly and gross way to do this. But it works!
OFFSETS = [0, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900, 2000]
CONTENT_ITEM_TYPES = ["story", "storylink", "video", "premiumvideo", "photogallery", "storygallery"]

class Command(BaseCommand):
    option_list = BaseCommand.option_list + (
            help="Don't upload anything"),

    def handle(self, *args, **options):
        # This allows you to pick a specific content item type from our CMS
        content_type = raw_input("Which content item type do you want to use? Pick one. Options are: story, storylink, video, premiumvideo, photogallery, storygallery ")
        start_time =
        slugs = []
        for offset in OFFSETS:
            results ={
                'conditions': {
                    'source': APP_SOURCE_CODES,
                    'content_item_type': content_type,
                    #'content_item_state_code': 'live',
                'filter': p2p.default_filter,
                #'order': {'last_modified_time': 'descending'},
                'limit': 100,
                'offset': offset,
                'content_item_state_code': 'live',
            print offset
            # Here we make a big list of each of the slugs for our CMS, so we
            # can later iterate through them and create the HTML
            for item in results['content_items']:
        # Create a template with each slug
        for slug in slugs:
            self.url_finder(slug, content_type)
        finish_time =
        total_time = finish_time - start_time
        print "All done, took %s seconds to complete." % total_time.seconds
        # I would later run the error slugs through individually to try and
        # catch any I didn't originally get
        print "These had errors: %s" % ERROR_SLUGS

    def url_finder(self, slug, content_item):
        print slug, content_item
        response = requests.get('http://localhost:8000/%s,0,0.%s' % (slug, content_item))
            if response.status_code == 404:
                print "Failed at /", response.status_code
                # Check if it is at /originals/slug
                response = requests.get('http://localhost:8000/originals/%s,0,0.%s' % (slug, content_item))
                if response.status_code == 404:
                    print "Failed at /originals", response.status_code
                    # Check if it is at /hub/slug
                    response = requests.get('http://localhost:8000/hub/%s,0,0.%s' % (slug, content_item))
                    if response.status_code == 404:
                        # It couldn't be found!
                        print "%s couldn't be found" % slug
                        print "Attempting to save at /hub"
                            html = urllib2.urlopen('http://localhost:8000/hub/%s,0,0.%s' % (slug, content_item)).read()
                            print "got html"
                            text_file = open(("archive/hub/%s,0,0.%s" % (slug, content_item)), "w")
                            print "Saved %s at /hub" % slug
                        except Exception, e:
                            print e
                    print "Attempting to save at /originals"
                        html = urllib2.urlopen('http://localhost:8000/originals/%s,0,0.%s' % (slug, content_item)).read()
                        print "got html"
                        text_file = open(("archive/originals/%s,0,0.%s" % (slug, content_item)), "w")
                        print "Saved %s at /originals" % slug
                    except Exception, e:
                        print e
                print "Attempting to save at /"
                    html = urllib2.urlopen('http://localhost:8000/%s,0,0.%s' % (slug, content_item)).read()
                    print "got html"
                    text_file = open(("archive/%s,0,0.%s" % (slug, content_item)), "w")
                    print "Saved %s at /" % slug
                except Exception, e:
                    print e

        except urllib2.HTTPError:

When you run the command, it asks you for which content item type to loop through. Then when it’s done, it gives you a list of slugs it couldn’t find on your local machine. I used that to run through it again later, just with those specific slugs, because sometimes errors could occur via the API on my local machine. If I couldn’t find it after a few more runs, that meant it wouldn’t live on our live site.

After that, I created an Amazon S3 bucket — — and started uploading the more than 2,700 files to their respective folders, making sure to set their Content-Type to “text/html,” as they were files that looked like “chi-nordstrom-buys-trunk-club-bsi-news,0,0.htmlstory” and S3 didn’t automatically think to load them as HTML.

Then when our site redesign launched, we had the DNS for point to the S3 bucket. The index file of that bucket redirects to And that’s how you archive 2,700 pieces of content without breaking any links.

Written by Andy Boyle

September 17, 2014 at 9:31 am

Posted in Uncategorized

I suck at this

leave a comment »

I’ve wanted to program computers ever since I was 14 or 15, and now that I’m 31, I can honestly say I have always known that I suck at programming computers. I’ve come to terms with this knowledge. It doesn’t bother me like it used to. I know I’m bad at my job, but I also know that it doesn’t matter what I think — it matters what I do. [Ed. note:  Abe is awesome at his job. He just doesn’t believe it.]

I take on tasks, close tickets, and steadily get better at not sucking quite so hard. But since this feeling of inadequacy seems to be common among programmers, let’s take a tour through my insecurities! I’m not sure why you should come with me, though, it probably won’t be that good.

May 15, 2000 — This date is made up, but it takes us to junior year in high school (go Dolphins!) One of my best friends, Scott, is a year ahead of me in programming classes – he’s taking AP Computer Science, and I’m just taking a programming class. He shows me the code he’s working on, mostly hard-core graphics for video games, mostly in C++. I don’t show him the code I’m working on, mostly crappy websites (the Wayback Machine preserved my Geocities Starcraft fanfiction abomination!)

At this point, I’ve been a fan of programming for a few years, but it hasn’t really clicked for me yet. Everything I write is basically copy-pasted from an O’Reilly book. (Sidenote: remember books?)

May 15, 2005 — This date is also made up, but it roughly corresponds to the day of my one truly rock-star college programming moment: the day I took (and passed!) the final exam for CS 440, Topics in Artificial Intelligence.

Getting a B in that class was one of the hardest things I’ve ever done. I barely understood what was going on; I’d leave lectures in a daze after 45 minutes of a professor saying words that I didn’t comprehend. I’d turn in homework assignments that at best failed in interesting ways. I got celebratorily drunk the night of the final, drunker than I’ve ever been before or since (Note to college kids: Tequila and Red Bull tastes delicious but it is such a bad idea, oh my God).

May 15, 2010 — Let’s just keep doing May 15ths every five years. Now we’re in California, and I’ve moved from a non-technical job answering emails from news publishers to a slightly-technical one, writing Python scripts that don’t work very well to interact with a poorly-maintained database that I am largely in charge of.

I’m not on a technical team, but I’m surrounded by extremely accomplished developers on related teams, and every time I sit in a meeting with them I barely say a word, for fear that they’ll realize I don’t know what I’m talking about and should probably be fired. Every time I have to make a change to the database, I spend the day with my stomach in knots, just knowing that I’ll screw it up somehow and everyone will find out. I dread every time I submit my code for review, certain that each time will be the one they discover that I’m a fraud and need to be eased out of the company.

May 15, 2014 — We’re more or less at the present, and for the first time in my life, I feel comfortable and confident showing up every day to write code. I’ve been at the Tribune for a little over a year; for most of that time, every Monday I would wake up nervous, worried that when I got in and checked my email, disaster would be waiting in the form of angry emails alerting me to an embarrassing bug (or worse.)

Every Monday this would happen, and at my previous job as well. But now it’s stopped happening. I’m not quite sure why, but I think it’s because I’ve realized that insecurities are not the same thing as reality. I know I’m not a very good programmer, but as I get better I realize that things that used to seem immeasurably more complex than anything I could understand are not so daunting.

The code I write is still bad, but it’s markedly less bad than it used to be. And that’s all I can do: to try hard, to steadily get better and to accept that I am going to make a lot more mistakes, some of them embarrassing, some of them catastrophic. There’s no other way to improve. There’s no other way to do what we do.

Watching the World Cup, I was reminded that this is the mentality that athletes must have — the cameras are on, the entire planet is watching, and a defender makes a misstep that lets a striker get by him for a goal. His entire home country thinks he’s terrible now. But he can’t let that bother him.

The next time that striker comes down the field, the defender has to challenge him, confident in the knowledge that this time, the striker’s going to look like an idiot. That’s all he can do. That’s all any of us can do.

Written by Abe Epton

August 20, 2014 at 3:08 pm

Posted in Uncategorized

Displaying High School Sports Data With No Databases

with one comment

Last summer Abe Epton and I were asked to solve an interesting problem: Display high school sports scores online and do it without a complicated interface or massive backend architecture.

At the time, our high school sports reporters and editors were taking phone calls for sports scores all night, entering them into the pagination system and then occasionally creating HTML tables, which they then pasted into our content management system. If they wanted to update scores online, they would have to regenerate the tables and paste them over the old ones or change things by hand.

In previous projects, I’ve helped build massive backends and admin systems in Django to keep track of sports scores, team rankings, individual players statistics and the like. They are big systems, require upkeep and a steep learning curve for those entering in the data. We didn’t want to add on more complexity to the ever-expanding amount of applications our team works on, so we wanted to figure out another way. And we wanted to simplify data entry for people who are already quite busy.

We pitched building everything as a Flask app that fed off a Google spreadsheet, similar to how our Tarbell project works. We would have a spreadsheet with seven tabs for seven days, with different rows for each of the data points, and then we would generate HTML files we could push to Amazon S3.

First we had to figure out how to store the data. Volleyball games have different scoring systems than track meets, which are different still from football or baseball games. So we had to create some fields for specific sports types and also train the staffers how and when to enter in data in certain cells in the Google spreadsheet.

Screen Shot 2014-05-27 at 1.07.52 PM

Then we designed the responsive frontend because people like to check sports scores of other area teams while they’re at the game. Users can look at individual high schools, individual sports or just the latest scores. Again, this was based on conversations about what we thought would best suit the needs of our users and what they were looking for.

The app updates every few minutes, checking to see if anything’s changed in the Google spreadsheet. We download the score data, hash it with md5 and compare that to the most recent hash. If they differ, we know some update of some kind has occurred, so we regenerate all the scores. But most of the time, they’re not different, so we save a lot of CPU cycles. But if anything has changed, it regenerates the pages and pushes them to S3. This helps us to save some money, as we’re not always overwriting existing pages or spending CPU time rebuilding scores based on an unchanged dataset.

Our colleagues on the pagination side have written scripts that download the Google spreadsheet and ingest it into our pagination system, getting updated scores as the night progresses. The application also uploads each day’s scores as JSON and CSV files, which are stored on S3, so our sports staff can go back and use the data — or update it — for any reporting purposes in the future.

Overall, I think it’s a pretty nifty app. Not only does it help present data in a pleasant way for our audience, but it also saves our sports staffers time every night, allowing them to focus on news gathering and not worry as much about presentation.

Written by Andy Boyle

May 27, 2014 at 2:24 pm

Posted in Uncategorized

How to get the interview

leave a comment »

Say you’re interested in an internship with the Chicago Tribune News Applications team (or, really, anywhere). You see our post on the Tribune Media Group jobs site, or right here on this blog. You respond. We don’t call you in for an interview.

“What went wrong?” you wonder.

Well, any number of things. Maybe nothing. Maybe we just didn’t think you were a good fit.

“But I would have been a great fit!” you wail to the heavens. You shake your fists and rend your garments. And then you ponder, “What could I have done differently that would have at least merited an interview?”

Okay, dry your eyes, get a drink of water, and prepare for some advice. I’ve been handling the annual Search for the Summer Intern for three years and I have thoughts.

First, though, you should know one thing: I want you to have the best possible shot at this (or any other) internship.

I’m not looking to arbitrarily disqualify candidates. I hope that we have more qualified applicants than we can handle—from a totally selfish perspective, options are great! Therefore I have provided the following list of the Five Most Important Things to Do on Your Internship Application:


Excuse the shouting, but this one piece of advice could stand in for the entire list if necessary. If you neglect to follow instructions in the application process, it seems like you either don’t care enough to pay attention to directions or don’t think the directions apply to you. Either option is a bad message to send.

If, for example, the instructions say (perhaps even in three separate places…) to include a cover letter, you should include a cover letter. If you choose to skip that step, I will choose to skip reading your resume. By not even meeting the basic requirements of the application, you’ve wasted your own time and that of a lovely gentleman in HR—and the only reason you’re not wasting my time as well is that I eventually asked said gentleman to not bother forwarding me any applications missing cover letters.

2. Demonstrate understanding of the role, relevance, and interest

No, the job description probably won’t spell out every duty the internship encompasses, but you should at least be able to figure out what the role broadly entails…and only apply if it’s something you want to do.

Sometimes your experience doesn’t seem like an obvious fit for the role you’re hoping to fill. That ostensible mismatch doesn’t automatically disqualify you from consideration, provided you explain why you’re interested in and think you would be a good fit for the internship. If your resume says you’re in culinary school and your cover letter is a generic copy-and-paste job, I won’t assume you’re learning to code on weekends and are an avid hackathon participant. But maybe you are! Make it explicit.

3. Pay attention to details

One typo isn’t going to disqualify you, even with someone as nitpicky as I am. But if your entire application is riddled with typos? Not good. If your cover letter is obviously copied and pasted and I can see where you’ve forgotten to change some of the relevant details? Also not good. If you get the name of the company or team wrong? Pretty bad. And I would always prefer “Dear News Apps Team” or the old standby, “To Whom It May Concern” over entirely the wrong name. If you’re not sure, don’t just guess.

All of the above come across as sloppy and lazy—you didn’t pay attention the first time and you didn’t look over your work before sending. Those are not traits we want in an intern, particularly one whose role will entail committing code to a live site.

4. Don’t squander opportunities

If for any reason one of my teammates or I do contact you (perhaps a colleague recommended you, or I have reason to suspect some form of technology botched your cover letter), make the most of the opportunity. Theoretically, we’re reaching out because we’re interested in you or want to give you a chance to explain or elaborate. If you’re really interested in the internship, this is the time to put in a little extra effort, not just copy and paste your generic cover letter, typos and all.

5. Follow the directions already!

Yes, it’s *that* important.

Written by Kaitlen Exum

May 15, 2014 at 4:35 pm

Posted in Jobs, Uncategorized

Flat Files And Server Denials: Covering Elections At Three News Orgs

with 5 comments


Covering elections is a staple in American journalism. I’ve covered elections as a reporter and I’ve helped display election data in drastically different ways at three news organizations.

So first, a little primer on elections data. Generally speaking, on election night, the data for vote totals is tabulated by county boards of election and then sent to a state-level board. Next, the data is harvested by vendors such as Ipsos and the Associated Press. Until recently, the only nationwide election data vendor for news organizations was the AP. While other data vendors exist, they usually focus on more niche markets, such as campaigns and political parties.

The AP has a physical person in every U.S. county to report back to them what the current vote totals are for different races. It’s incredibly costly, but means you can dive deep into trends in data. The AP has a system that lets you FTP in and download the data in XML or CSV format, which your publication can then display.

The AP doesn’t always get state, county or local-level election data in this same manner. Thankfully, most states (and some counties) have online data portals, RSS feeds or APIs that can be downloaded, scraped or accessed to get the data you’re looking for. In some places, though, a real person has to sit in an election board’s offices and get the election data back to the news organization somehow, typically by calling or emailing.

While displaying data online may get a lot of attention these days, remember that many news organizations still print something every day. So news organizations have also needed to solve the problem of importing AP election data into their print editions, too — generally through decades-old pagination systems.

Now let’s talk about the differences between the three places I’ve wrangled election data for.

The New York Times Regional Media Group’s Election Data

In 2010, I was a newbie developer at the now-renamed The New York Times Regional Media Group. I started a few weeks before the 2010 midterm elections. My new coworkers had already built a system to FTP into the AP, import the data into a MySQL database and then display it on our 14 news websites using iframes hitting tables built in PHP.

I helped by load-testing, or seeing how much traffic the project could take, while we were running importation tests of the AP’s test data runs. By my estimations using Siege, I thought we were in the clear, with 2,500 hits a minute not crippling anything. If election night traffic had indeed been 2,500 hits a minute, we might have been in the clear. We were not.

If memory serves, we had one EC2 medium instance running to import and display the data and a medium MySQL instance running for the database. I didn’t know about caching and thought it was just something that was turned on automatically. It wasn’t.

On election night, we had two newspapers who received election data first, and things ran smoothly with them, as they were in smaller markets. Then the Florida papers started getting heavy traffic. Our EC2 instances became bottlenecked, stuck at 99 percent CPU usage, unable to read the AP data, let alone write it to the database with updates.


This brought all 14 of the newspaper websites to a crawl because these iframes were getting loaded before almost anything else on the page. In the end, homepage editors took the iframes off the pages, a coworker wrote some SQL to hand-optimize the election tables and, by then, traffic to the sites had subsided to reasonable levels.

It was the scariest night of my professional life. Thankfully, most of the newspapers were happy, as they hadn’t ever even attempted to display live election data on their websites, so this was still an improvement for them. And I learned to set up caching — in later cases, Varnish — when attempting to hit a live database in any way.

The Boston Globe’s Election Data

Next, I was at the Boston Globe during the 2012 general primaries. As then-hopeful Mitt Romney was the former governor of Massachusetts, the Boston Globe was a major source for news and coverage of the GOP primary battle. And the New Hampshire primaries were that paper’s bread and butter.

But the team I worked on had a fun logistical problem: We needed to display the data on two websites, and the newly-launched Each ran in a different content management system, each had different styles and each wanted the data displayed a little differently.

The first problem we had to solve was how to pull in the data. The Boston Globe’s CMS was Methode, which stored everything — stories, photos, etc. — as pieces of content in XML. As the AP already provided data in an XML format, we would just need to import it, change some of the tags to better suit the Methode ingestion system and then I would write the code necessary to display the data.

Thankfully, the Boston Globe’s systems staff figured out quickly how to go in and download the XML data and put it into a spot in the CMS that I could access. We had created mockups and styles for displaying the data responsively — still a new concept at the time — and now had to pull in the data, via some incredibly ugly Java I wrote.

We didn’t have time to do something similar with the CMS, which was at the time, I believe, going on 12 years old, and was somewhat fragile. So we decided to build separate styles and templates in that would could iframe into Not the best way to do things, but it’s how we did it.

And then, as the primaries started happening more and more frequently, I would have to make each primary its own chunk of code, violating the DRY principle repeatedly, and just trying to get everything deployed to production in time for the producers to be able to slot the items on the various homepages.

Another coworker had an old Python script that just created basic HTML tables for county/town election totals and pushed them into, for a more in-depth look. Lots of moving parts, different content management systems, different styles, a lot of work for the small number of people working on it.

The Chicago Tribune Way(s)

Now I’m at the Chicago Tribune. In 2012, my coworkers built a system that pulled in AP election data into a Django site with Varnish in front for caching. For local races, they pulled data entered by Chicago Tribune staffers into Google spreadsheets based off information gleaned from various county board of election sites, which were then turned into flat files as well. And then the AP data was pulled into our pagination system for the print product through tables the AP sent, just like it had been done in previous elections.

Fast forward to a month ago. The Chicago Tribune no longer subscribes to the Associated Press, but Reuters has entered the election data game. Instead of having to FTP and download XML files, we hit an API and receive JSON. It’s pretty nifty and much more conducive to building web-facing applications.

We wrote a Python wrapper to hit the Reuters API and reformat the data for our purposes, and then we again built flat pages based on that data, using Django Medusa. And for local elections and referenda that Reuters wasn’t covering, we again had Tribune staffers entering data into Google spreadsheets.

We still had to write a custom system that takes the Reuters and Google spreadsheet data and sends it to our pagination system. This required us figuring out how the data needed to look — basically a mix of XML-ish template tags and tables — and then FTPing it to an area where our pagination system could ingest the files, give them proper templating and allow page designers to put them on pages.

So what have I learned?

Elections are big events traffic-wise, and static sites take large traffic pretty well. With the Boston Globe and Chicago Tribune solutions of using basically static sites (XML and sites baked to S3), it meant little freaking out at 9 p.m. when you’re getting thousands of pageviews a second. If you’re having to deal with lots of calls to your database while it’s also reading and writing, you’re going to have a bad time. Static sites are wicked great.

Testing is important, but knowing what to test is more important. At The New York Times Regional Media Group, I thought I knew what I was doing, but I was testing for unrealistically low traffic and didn’t think about what would happen while it was trying to write election data to the database, too. I now know I could have asked folks on the NICAR listserv for help, or tweeted questions or really just asked anyone with a few years of experience, “Hey, will this work?”

Election nights are stressful, so be cheerful and smiley. We at team Trib Apps try to be cheerful and kind whenever working with anyone, but with this many moving parts, it never hurts to just think “smile while saying words” when conversing with other folks. We’re all working hard on these nights, and I’m a big fan of not adding any extra stress on people’s lives. That’s also part of what our technology is supposed to do — make things easier for folks in the newsroom.

Have a point person from the tech side to coordinate with the newsroom. When local election data started coming in, I stood in the area where folks were entering it into Google spreadsheets, just so someone was around to help answer any questions on the spot, while David Eads, who was the lead developer on the elections project, made sure the technical side was running smoothly. We had only one minor hiccup that was quickly fixed and we were able to identify it because we were all near one another, able to communicate more effectively. Even though we work with machines, this job is mostly about communication between humans.

Know that you’re going to be covering an election again and make your code reusable. When we were writing our code for the primary, we knew a general was coming up in November. We also knew that other Tribune newspapers would be itching to show election results so we needed to get the fundamentals right the first time.

We would love to hear about your experiences with election data. Please feel free to add a comment and tell us your story.

Written by Andy Boyle

April 25, 2014 at 12:32 pm