Tribune DataViz

Matters of interest, from the data reporters and developers across Tribune Publishing

Author Archive

Reimagining the Chicago Tribune home page

with 2 comments

In June some of us ventured to to the Global Editors Network conference in Paris to participate in a hackathon against news development teams from around the world. We were invited to attend after having won a similar online reputation management hackathon at the New York Times in April.

The 11 teams each had 2 days to develop an idea around the theme “How would you redesign your news organization’s homepage to increase user engagement?”

Our team – Alex Bordens, David Eads, and myself – decided to take the opportunity to develop an idea that fundamentally differs from a traditional newspaper homepage. We wanted to come up with something that could not immediately replace our homepage, but would be a vision of our homepage turned on its head.

Chicago (Tribune) River

The content that is displayed on newspaper websites is chosen primarily by editors. We apply news judgment in two different ways: in deciding what to cover, and in deciding what to surface on our homepage.

Increasingly we are seeing that readers who use the web as their primary source of news prefer a great volume of content to ‘snack’ on. Conversely we’re also seeing an interest in high-quality magazine-length work. It seems that web-native readers draw a hard line between these two modes of news consumption, and typically look for one kind or the other depending on their mood or time of day.

And so we built this little web app, half of which is dedicated to the traditional presentation of news and professional editorial judgment, and the other to a raw feed of content that’s ranked by the readers, heavily inspired by sites like Reddit and Hacker News.

Screenshot of Chicago Tribune River site

But how that raw feed is ordered was a point of intense discussion for our team. We debated making it driven by editorial judgment, analytics, social media metrics, or direct reader voting.

We landed on giving readers the ability to decide how they want the stories ordered using tabs, just above the list.

Most votes, Most recent, trending options

Some alternate list views

But what we were most intrigued by, and what we implemented first, was direct reader voting.

Most of the analytics we look at on a regular basis are traffic-based. This paints a picture of what readers are looking at, but not necessarily what they think. Shares via social media provide insight into whether a piece has struck a chord with readers, but it takes time to read and digest the individual responses to discover the readers’ opinions.

We wanted to create a simple direct feedback mechanism that could tell us what readers like to see on our home page.

What’s next

There are many more discussions to have before we determine if this is an approach that makes sense for the Chicago Tribune, so for now, it’s our pet project. We plan to keep it running and filled with real live news, with the goal of making incremental improvements as time allows.

So take a look, upvote something, downvote something–just let us know what you think of our experiment. We’d love your feedback!

Written by Ryan Mark

July 8, 2013 at 9:31 am

Posted in Apps, Craft

We’re hiring: WordPress, HTML5 developer extraordinaire

with one comment

We are looking for an experienced web developer who can help us build sites for the Chicago Tribune and Chicago Tribune Media Group. Somebody who has a passion for code and getting things done. Somebody who likes having a problem to solve.

On the News Apps team, you will help us research, design, and build online news products. You will be a generalist: sometimes interviewing or helping users, sometimes writing HTML and CSS, sometimes coding in Python, PHP or Javascript, and sometimes working on servers.

You will work with a group of talented, passionate folks who enjoy making websites and software. We have short deadlines so we work iteratively and try to work closely with our users and stakeholders. It can be stressful at times, but it’s worth it. We build good stuff fast and you will become a better programmer. You will always be refining your tools and trying out bleeding edge web technologies. You will make things you will be proud to show mom.

Acronyms and buzzwords:

These are the tools we use. Apply if you can rock them.

  • WordPress & PHP development
  • Git
  • HTML5, CSS3, SASS and Responsive Design
  • Javascript & Coffeescript: Backbone.js, Underscore.js, jQuery
  • Python and Django
  • Amazon Web Services: EC2, S3, RDS
  • Linux/Ubuntu server administration: Apache, Nginx, Varnish

P.S. You don’t have to know them all to apply.

Stuff we’ve done:

You will be working on these sites, and new ones like them.

And you’ll be contributing to our blog and our github.

Gear you’ll get:

  • One shiny, new MacBook Pro (or an iMac, if you’d prefer)
  • One CDM (Cheap Dell Monitor)
  • One comfy Aeron chair
  • …all at a desk somewhere in the Tribune newsroom, where you’ll be surrounded by reporters arguing with the cops, yelling about the ball game, telling crazy stories, and otherwise practicing their trade.
Interested? Send a cover letter and resume to newsapps at

Written by Ryan Mark

January 12, 2012 at 10:12 am

Posted in Jobs, PHP, Python

From spreadsheet to HTML in 15 minutes with python-tablefu, Jinja and Flask

with 2 comments

The best Christmas carol

We often need to take a spreadsheet of info and lay it out in HTML on deadline. Typically we use ProPublica’s TableSetter which takes a google spreadsheet and generates an HTML table. TableSetter can be tweaked in a bunch of different ways to customize the generated table, but as with all specialized tools it has it’s limits. Luckly it’s easy to create a rudementary tablesetter clone in python quite quickly.

This week I got a spreadsheet of Christmas carols with YouTube embed codes to go along with a story about a Northern Illinois University professor and carol expert who recently died. The shape of the data in the spreadsheet lent itself more to a top-ten-style list than a table, so TableSetter was not the best tool for the job. The spreadsheet was only 25 rows, but I was not about build all the HTML by hand.

I exported the spreadsheet to a csv file, created a simple html template and wrote a simple script to mash the two together. I used Chris Amico’s python clone of ProPublica’s TableFu and the great templating library Jinja.

Here is the result.

Here is the code

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import jinja2
import table_fu

TEMPLATE_DIR  = 'templates'
DATA_FILE           = 'data/xmas-carols.csv'
TEMPLATE_FILE = 'xmas-carols.html'
OUTPUT_FILE     = 'build/xmas-carols.html'

# Set up jinja templates. Look for templates in the TEMPLATE_DIR
env = jinja2.Environment(loader=jinja2.FileSystemLoader(TEMPLATE_DIR))

# Open the csv file and load it up with TableFu
table = table_fu.TableFu(open(DATA_FILE, 'U'))

# Get the template and render it to a string. Pass table in as a var called table.
html = env.get_template(TEMPLATE_FILE).render(table=table)

# Write the html string to our OUTPUT_FILE
o = open(OUTPUT_FILE, 'w')

Jinja templates are very similar to Django templates. The biggest difference that I can discern is the much more powerful template syntax. It does stuff that I always thought Django templates should be able to do.

Well this script is kinda boring without the template:

<ul id="xmas-songs">
{% for row in table.rows %}
        {% if row['Year featured']|trim %}
<div class="featured">{{ row['Year featured'] }}</div>
        {% endif %}
<div class="name">{{ row['Name'] }}</div>
        {% if row['youtube'] %}
<div class="youtube">{{ row['youtube']|safe }}</div>
        {% endif %}
<div class="written"><small>Written:</small>

            <strong>{{ row['Year written'] }}</strong></div>
<div class="origin"><small>Country of origin:</small>

            <strong>{{ row['country of origin'] }}</strong></div>
<div class="composer"><small>Originally by:</small>

            <strong>{{ row['original composer/lyricist'] }}</strong>

            {{ row['Other composers/lyricists'] }}</div>
        {% if row['keywords']|trim %}
<div class="keywords"><small>Memorable lyrics</small>

            {{ row['keywords'] }}</div>
        {% endif %}
        {% if row['famous versions']|trim %}
<div class="famous"><small>Famous renditions</small>

            {{ row['famous versions'] }}</div>
        {% endif %}
        {% if row['noteable']|trim %}
<div class="notable"><small>Did you know?</small>

            {{ row['noteable'] }}</div>
        {% endif %}</li>
{% endfor %}</ul>

That’s it. There are a lot of cool things you can do with TableFu that aren’t illustrated here, and Jinja does a ton of stuff, their documentation is pretty extensive.

Oh there’s one other thing. If you don’t feel like dropping back to the shell to run the script to update your HTML with your new CSS or HTML changes, you may want to …

Sprinkle a little Flask into the mix

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import jinja2
import table_fu
from flask import Flask

TEMPLATE_DIR = 'templates'
DATA_FILE = 'data/xmas-carols.csv'
TEMPLATE_FILE = 'xmas-carols.html'
OUTPUT_FILE = 'build/xmas-carols.html'

app = Flask(__name__)

def main():
    env = jinja2.Environment(loader=jinja2.FileSystemLoader(TEMPLATE_DIR))
    table = table_fu.TableFu(open(DATA_FILE, 'U'))

    html = env.get_template(TEMPLATE_FILE).render(table=table)

    o = open(OUTPUT_FILE, 'w')

    return html

if __name__ == "__main__":

There you go. Just run your script, visit http://localhost:5000/ and see your handiwork. Reload the page when you make changes. And when you’re all done, just grab the output file and drop it where it needs to go.

Written by Ryan Mark

December 7, 2010 at 5:07 pm

Posted in Python, Recipes

WordPress network auto-deployment

with 5 comments

Going from Django development to WordPress development is rough, not only because I generally dislike PHP (the flashbacks), but mainly because tools that we’ve come to rely on in Python simply don’t exist for WordPress.

Repeatable automated deployment

Our Django apps are deployable with the push of a button and each of us have identical code and data for testing. We found nothing like this for php or WordPress, and we really looked, honest.

Much of a WordPress blog’s configuration is stored in a table in the database. In a WordPress network, each blog has it’s own configuration table. For TribLocal, we needed to create and manage almost 90 network blogs. Each is practically identical but still needed to be customizable. There was no way we would configure each blog by hand.

So with the power of fabric, and a pile of php scripts reverse-engineered from the WordPress code, we built a rig to bootstrap WordPress — from the vanilla download to a customized, functional blog with a single command. Kudos to our contractors, Human Made, who helped quite a bit to make this all work.

Hello Network!

This tutorial is written for folks on Mac OS X. Much of the instructions will work on any UNIX variant. If you’re on Windows, I’d love to hear how you got this to work.

You’ll need to have fabric installed.

sudo easy_install fabric

Let’s grab the scripts for this tutorial. You can clone them from github here. Next you need to grab the latest WordPress. Download and unzip the WordPress archive. You want to take all the files from the wordpress folder created by the archive and move them to the folder you just cloned from github with all your fancy new scripts.

git clone wp-deploy
cd wp-deploy
curl -O
tar -zxf latest.tar.gz
mv wordpress/* .
rmdir wordpress

To get this new mess of code to do anything interesting, you will need an Apache, PHP and MySQL stack on your machine. We do all WordPress development locally and deploy stuff to a server when we want to test or show off our work. Since we all work on Mac OS X, we decided to use MAMP for development. We chose MAMP because it’s a tight package that made it trivial for everyone to get running quickly. Feel free to setup your stack however you like, but this tutorial assumes you’re using MAMP.

Out of the box, MAMP needs a few tweaks before it will serve our project. Download the latest version, install it to your Applications folder and run the MAMP application. MAMP will immediately start Apache and MySQL and open its web control panel in your browser.

Find MAMP on your dock and click it to bring up the status window. Click “Preferences…”. Under the “Ports” tab, set the Apache port to 80. WordPress gets cranky if you try to run it on a port other than port 80. Under the “Apache” tab, change the “Document Root” to the project folder with your new scripts and wordpress code.

If you’ve ever turned on web sharing, turn it off now. If you already have MySQL installed on your machine, you’ll probably run into trouble.

You need to add MAMP’s MySQL binaries to your path, so you can use mysql from the command line.

export PATH=$PATH:/Applications/MAMP/Library/bin

Add this command to the end of your ~/.bash_profile so you can always access the MySQL commands.

echo "export PATH=$PATH:/Applications/MAMP/Library/bin" >> ~/.bash_profile

Add the demo WordPress domain name to your hosts file.

sudo bash -c 'echo "" >> /etc/hosts'

(If you’re on Leopard or earlier, your hosts file is /private/etc/hosts)

At this point you should be able to fire up a WordPress site with the demo settings. Just run the bootstrap fabric command.

fab bootstrap

You should have wordpress running now. Visit You also should have network blogs running at and

Configuration juiciness

These scripts are just a starting point. The ones we use for TribLocal are heavily customized. Dig through them and tweak to your hearts content. We’ve annotated them a bunch, but ask us more questions if you find something that isn’t clear.

The scripts:

    Where all the deployment settings are put. Use this to setup your production or staging server or to change settings for local development.
  • scripts/na-options.php
    Most important file. Holds all the config information for the WordPress side of the automatic setup.
  • scripts/na-install.php
    First script that runs. Installs the WordPress database, root blog and network.
  • scripts/na-postinstall.php
    Second script that runs. Configures the root blog and network with settings from the na-options.php.
  • scripts/na-createblog.php
    Script that creates a network blog. Pass it an index, uses that index to fetch data about the blog it should create from the $sites array in na-options.php.
  • scripts/na-setup-plugins.php
    Enables plugins.

One more thing

Something that always bothered me about WordPress is the way the domain name is stored in the database. You can’t just dump and load the database in a new location without also moving the domain name. We want it to be push-button to deploy an identical copy of the site in multiple places.

So I wrote a few fabric commands to shuttle a WordPress database across servers and domain names.

fab dump_db
fab load_db
fab reload_db

These commands will create or load data/dump.sql.bz2. They’ll correct the domain name in the WordPress database to whatever you have defined in your fabfile. It just pipes the database content through sed to replace the domain names. Super useful, eh?

UPDATED: Fixed the git clone URL.

Written by Ryan Mark

October 5, 2010 at 3:11 pm

Posted in PHP, Recipes

Advanced django project layout

with 24 comments

Default django project layout versus news apps project layout

Default django project layout versus news apps project layout

We’re releasing our project layout for Django, based on Gareth Rushgrove’s lovely django-project-templates. If you’ve found yourself unsatisfied with the default layout, or you’re using our fabfile or ec2 image, you might be interested in using our project layout.

The default Django project layout makes it dead simple to learn the framework and get an application up and running. But it can quickly get cumbersome as your application grows and you have to figure out how to handle deployment. A few projects, most notably Pinax, have their own ways to organize large projects.

Here are the things that we need that the default layout doesn’t provide a solution for:

  • Separate settings, Apache configuration files and WSGI handlers for local development, a staging server and a production server.
  • A separate place for the various primary source data files (CSV, JSON, shape files) we typically have in a project.
  • A place to put Django apps that does not clutter up the root directory of the project.
  • A library directory to keep various reusable helper functions that are not Django applications.
  • A template directory and media directory for the entire project.

Gareth’s project is really well organized and addresses all of these issues. We tweaked his templates to match our use case.

Getting off the ground

  1. Clone my fork of django-project-templates.
    git clone git://
  2. Install the templates. It will install the dependencies: PasteScript, Cheetah and Fabric. You may want to use a new virtualenv.
    python install
  3. Create a new project from the News Apps Paste template.
    paster create --template=newsapps_project example_project
  4. You’ll be asked for staging and production domains, a git repository location and a database password. These setting will be put in the fabfile and will be used for deployment. You’ll also be asked for a secret key which is used internally by Django. It’s okay to press enter and accept the defaults. The template will still get created, you’ll just have to edit the fabfile later if you plan on deploying the project to a staging or production server.

The template contains a lot of personal preference but it’s been very useful for us and  handful of projects. We are all quite satisfied with it. Take it, use it, tell us what you think!

Written by Ryan Mark

March 8, 2010 at 2:30 pm

Posted in Infrastructure, Python

Small teams, loosely joined

with one comment

Chicago homicide tracker screenshotLast week, we launched a new application for the RedEye – the Chicago homicide tracker. The web site makes it simple and interesting to browse homicide crime data for the city. RedEye reporter Tracy Swartz has been compiling the homicides since Jan. 1, 2009 and writes a weekly analysis. She wanted to give readers a better way to browse and understand the data and we wanted to help but never had enough time to give the project the attention it deserved.

The homicide tracker might look familiar if you’ve ever seen the L.A. Times homicide project. That’s because it’s the same code. LA Times hacker team of Ben Welsh and Ken Schwencke generously let us use their code (caveat: we all get our paychecks from the same place – Tribune Co.). It took four days of re-factoring, reorganizing, writing new data loaders and a new skin to make the L.A. Times code work for the RedEye.

This kind of project plays to the strength of the small newsroom dev team. We started with a small-medium application that was built to solve a specific problem, but not to be reusable. We worked with the reporters to figure out what about the L.A. Times app we should keep, what we should scrap and what we should change. We ignored the urge to refactor and leave as much of the original code as possible, tweaking only what was necessary. With the help of the author of the original, we were able to quickly make our changes and launch.

Free and open technologies are key to our small teams working quickly. Pulling content and data from RSS and Google spreadsheets allowed us to skip building a content management system for the homicide tracker. Using a sophisticated, modular web framework helps to make us efficient.

The moral of the story is that for news apps, small teams sharing code, insight and ideas – “small pieces, loosely joined” – is quite effective.

Written by Ryan Mark

March 1, 2010 at 4:22 pm

Posted in Apps, Craft

Our GeoDjango Amazon EC2 image for news apps

with 12 comments

UPDATE: For our friends in the EU and other interested parties, here is the recipe for building the AMI from the original Ubuntu community image.

Today we’re happy to make public a version of our Amazon EC2 image. It’s Ubuntu Karmic running Python 2.6, Apache2+WSGI, PostgreSQL+PostGIS, Memcached and Pgpool. The image is built on an excellent Ubuntu community image, so it supports all the same user-data goodies.

Be sure to check our sample GeoDjango application, built to run on this stack!

Launching the image

Start-up a new instance of the Amazon EC2 AMI ami-ff17fb96.  If you don’t know the answers to any of the questions in the launch wizard, you can simply accept the defaults, but take note:

  • If you make a new key pair, be sure to keep track of where you save the *.pem file you download, because you’ll need it later to connect to the server.
  • If you make a new security group, be sure to configure the firewall to permit HTTP and SSH connections from your IP address. If you’ll be using this image to serve something to the world, allow HTTP connections from (that is, from anywhere on the internet). For best security, limit SSH to as few IP addresses or subnets as you can.

Connecting to the Running Instance

Once the instance is running, connect to it using your key-pair private key and the newsapps user, not root.  There may be a brief period after EC2 tells you the server address where you still get a Connection refused message. Just wait a minute or two.


Initializing the instance

Once logged into the instance, you’ll notice a few things in the newsapps home. directory.

~$ ls
logs  sites

The scripts will configure the server for one of three modes. A front-end application server, a back-end database server, or a monolithic server that runs both the application and the database.

This post will just cover setting up the instance as a monolithic server, with a post on a multi-server configuration coming at some point.

Run the kitchen sink script. You will be walked through the setup process for the server.

~$ ./

First you’ll be prompted to create a key pair for the newsapps account. It’s best to provide a password for a key pair, but it caused problems for our automated deployment so we left it empty. Once you get through the password prompts, you will be shown the public key for the newsapps account.

We use the public key for GitHub and Unfuddle so that we can git clone our app directly on the server. You might use this key if you need to securely connect to your repository for secure deployment or for automated ssh.

Generating cert...
Enter passphrase (empty for no passphrase):
Enter same passphrase again:

Here is this server's public key (for git deployment)

You’ll be prompted for your public key. This will be the key from your development machine. Just copy and paste into the console. It’s usually located in your home directory ~/.ssh/ The script will add your public key to the server’s authorized keys so you can ssh and deploy to the server without having to provide your Amazon private key.

Enter your machine's public key (for fabric deployment)
ssh-rsa XXXXXXXXXXXXXXXXXXXXXXXXXXX rmark@hurley.local

We use S3 to serve all of our static media content and recommend you do the same. It’s cheap and opens up your EC2 instance to handle more traffic. The Apache configuration on our instance has keepalive turned off. If you want to use your EC2 instance to serve media, you should setup another web server, like Nginx or Lighttpd, to serve media separately. You can turn keepalive back on, but it’s not recommended.

You’ll be prompted to setup your S3 credentials on the server. As part of our fabric deployment, static media is pushed from the server to S3 using s3cmd. You’ll need to fill this out for the script to finish setting up your server. (You can enter bogus info now if you’d like, and reconfigure it later by running s3cmd --configure.)

The s3 configuration will also prompt you for an encryption password. This can be left blank or be anything you like. It’s not something you’ll need to remember, it’s just something random that helps encrypt your traffic to the server.

Accept defaults for the rest.

Configuring S3 caching...

Enter new values or accept defaults in brackets with Enter.
Refer to user manual for detailed description of all options.

Access key and Secret key are your identifiers for Amazon S3

Encryption password is used to protect your files from reading
by unauthorized persons while in transfer to S3
Encryption password:
Path to GPG program [/usr/bin/gpg]:

When using secure HTTPS protocol all communication with Amazon S3
servers is protected from 3rd party eavesdropping. This method is
slower than plain HTTP and can't be used if you're behind a proxy
Use HTTPS protocol [No]:

On some networks all internet access must go through a HTTP proxy.
Try setting it here if you can't conect to S3 directly
HTTP Proxy server name:

New settings:
  Access Key: asdf
  Secret Key: asdf
  Encryption password:
  Path to GPG program: /usr/bin/gpg
  Use HTTPS protocol: False
  HTTP Proxy server name:
  HTTP Proxy server port: 0

Test access with supplied credentials? [Y/n]  Y

Save settings? [y/N] Y

The script will now enable all services and get everything started. If you want to tweak the image, you can start from our snapshot: snap-e9c69d80.

We included Postfix in our server stack, but it’s disabled by default because configuration is kind of complicated and most mail servers do not accept email from EC2 servers for spam reasons. Run sudo dpkg-reconfigure postfix to configure Postfix before running it.

Take it for a spin

We’ve also just released a sample Django application to illustrate how you might put this thing to use, including our project layout, some basic geographic operations, and our Fabric deployment process. Try it out!

Written by Ryan Mark

February 17, 2010 at 3:37 pm

Posted in Infrastructure