Tribune DataViz

Matters of interest, from the data reporters and developers across Tribune Publishing

Author Archive

Announcing Tarbell 0.9 beta 6

leave a comment »

Today we released Tarbell 0.9 Beta 6. (Tarbell is our open-source static site generator based on Google spreadsheets, made with newsrooms in mind. Read more here!) This is our biggest release to date and should be our last release before a stable 1.0 version to come in the next week or two. Here are some notable changes:

  • New naming conventions: The “base template” naming was confusing to users. We have switched to the term “Tarbell blueprints” to better reflect the role and function of this key Tarbell concept. The “_base” directory has been renamed “_blueprint” and the documentation now refers to “Tarbell blueprints” instead of “base templates.” Projects created with previous versions of Tarbell will still work.
  • Expanded documentation: We greatly expanded and improved the Tarbell documentation, including a more in-depth tutorial.
  • New hook system: Developers can now trigger actions during project installation. Create a repository and tickets when creating a new project or refresh the Facebook cache when publishing.
  • Improved command line interface: Better wording, formatting, and line-wrapping.
  • Better credentials: Tarbell now supports publishing from non-interactive environments.
  • Support for project requirements: Tarbell projects and blueprints can now specify 3rd party Python libraries as dependencies.

Get started by installing Tarbell! Already a Tarbell user? Upgrade with:

pip install -U tarbell

Special thanks goes to Heather Billings, who did tremendous work on this release.



Written by David Eads

June 6, 2014 at 11:24 am

The Nuts and Bolts of Tribune News Apps Event Listings

leave a comment »

Screen Shot 2014-02-17 at 9.43.45 PM

A few months ago, we built a reusable Javascript app to provide an interface to a quirky events API. Internal stakeholders needed standalone, searchable event listings, embeddable calendar and upcoming events widgets, and integration with Google calendar, Facebook, and desktop calendar software.

Here’s how we built a client side app using a turbo-charged Backbone collection powered by LunrJS and a set of independent Backbone views powered by TypeaheadJS and FullCalendar to create a library that can be deployed in a wide variety of situations at the Tribune. After a solid year of using/loving/hating/fighting Backbone but uninspired by the other options, we found a style that our team can build on.

You likely have a different quirky API to deal with. Hopefully you can learn from our architecture, optimizations, struggles, and future plans.


Three Tribune properties — Blue Sky Innovation, Theater Loop, and Just Kidding — needed event listings pages and homepage widgets. The event information is entered by Tribune employees and is available via a web API provided by a third party vendor.

The vision was to build a simple list of events with a search box at the top and calendar next to the event list. We’ve had good success with using TypeaheadJS to provide simple search. The goal was to provide a single way to filter the data using Typeahead’s support for multiple datasets.

Screen Shot 2014-02-17 at 9.47.46 PM
Searching for opera

Of course, there was some bad news. The API was designed for a specific type of user interface that implements faceted search. It can filter specifically on venue, town, neighborhood, and other event properties. You’ve probably seen faceted search on shopping sites like Amazon, eBay, and millions of others.

Screen Shot 2014-02-17 at 9.53.18 PM

A never ending list of filtering options in the left rail are search “facets”

The lack of full text search was a real problem, particularly because event descriptions contain important information like review author and star rating (for lack of a better place to put them).

The solution

To get around this and other limitations of the API, we took a novel approach. Instead of using the API to execute searches, we instead load all the data for a specific time frame (typically the next three months) and index it in the user’s browser. Instead of using the API for search, we turn your browser into an event search engine using LunrJS.

The downside of this approach is a somewhat heavy initial page load. To minimize the pain, we gzip and aggressively cache results. The upside is that once the data is loaded, the search is very fast since all the data is indexed in memory on the client computer. And we can tune the search to return exactly the results we want for any given search term.

Building the app

Let’s look at the finished components and how to deploy a simple version of the app.

  • Event collection: Loads all event data for a given time range from the API, indexes it using LunrJS (a browser based full text search engine) and provides a search() method and event.

  • Event list view: A list of events that renders on every search event.

  • Event calendar view: A calendar of events that renders on every search event.

  • Event search view: A search box (powered by TypeaheadJS) that triggers search events.

The collection is the central component. All the views either listen for collection events or trigger collection events based on some input value, either through a Backbone router or the autocomplete search box.

Screen Shot 2014-02-17 at 10.17.43 PM
Search box embedded on Theater Loop homepage

The following code snippet is from a widget that provides a simple event search box that can be embedded on an external page.

What’s happening here?

  • Define the collection.

  • Set up a filter view that is bound to the collection.

  • Add an event listener that waits for the data to be loaded, then attach custom behavior to the autocomplete box to navigate to the external URL on selection.

  • Get the data! The filter view listens for the sync event, meaning the data is loaded, and then uses the data to set up a typeahead box.

The full blown event app

Showing all the code is likely to cause your eyes to cross, the architecture’s what matters anyway:

  • Set up helper functions to scroll around the results list.

  • Define the collection.

  • Define a Backbone router to manage and represent the application state via the URL.

  • Set up a filter view, calendar view, and event list view.

  • Bind search behavior to route change events: When the route changes, call

  • Bind route change behavior to view events: When a day in the calendar is clicked or a search is executed, change the URL and trigger a router event.

  • Get the data!

The event list and calendar are already listening for search events being emitted by the collection to render the appropriate slice of data.

Keeping the components loosely coupled

Each Backbone view is completely self contained and depends only on the collection. That means each deployment of the code must set up its own event listeners. This is annoying because it means some repeated boilerplate code across deployments.

That annoyance is outweighed by the flexibility of the system: Want a calendar on your homepage that links to the event search? We can do that! Want a calendar and an autocomplete box? We can do that too! How about an autocomplete and a list of events? Sure thing, holmes. Need to parse event descriptions as Markdown? We got this!

Digging into the collection code

The data that is provided by the API looks like this:

There’s a kind of compression going on in this data format. Instead of listing every event on every date as a unique object, only distinct events are provided, with a nested list of dates that the event occurs on.

For the purposes of display, this data structure will need to be “unspooled” into a flat list of events. Let’s take a look at the collection source and discuss a couple of key optimizations and notable features of the code:

There are three key functions here: initialize, parse, and search.

initialize is Backbone boilerplate to load in options and set up your view. It does extra duty here by immediately indexing the data using LunrJS.

The parse method massages the data structure above into something useful for display:

  • Call optional preprocess function on the data

  • For each row in the data:

    • Add row to rawData collection attribute to capture state of the data immediately after any preprocessing.

    • If the index option on the collection is true, index the row with LunrJS.

    • Collect venues, dates, and neighborhoods for use as TypeaheadJS datasets.

    • “Unspool” events by iterating over each repeating date and create a new object for that particular event, date, and time.

  • Remove duplicates from collected venues, dates, and neighborhoods.

The search function populates a collection.filtered property, which is the data we’ll want to display. The trick here was to make the filtered list itself an instance of the event collection. This instance does not need to be indexed by LunrJS (we already indexed the whole dataset) but it does need to be re-parsed.

Perhaps it sounds inefficient to re-parse the data on every search. But this design was born of necessity. Originally, the indexing happened after the raw data was unspooled into a big flat list of events. Unfortunately, a list with 400 events could easily become a 5,000+ element list, especially for theater events which tend to repeat in almost all cases.

It’s no surprise, but indexing 5,000 elements is a heck of a lot slower than indexing a few hundred elements. And pointless, given that the only difference between repeating event instances is the date of the event. The title, description, etc, will always be the same.

The code was refactored. Instead of searching against the flattened list, we search against the raw data, get the subset of the raw data that matches the search, and re-parse that subset. Despite the cost of slicing and re-parsing the raw data for every search, this approach was still on the order of hundreds of times faster and significantly more memory efficient than indexing every element after unspooling repeating events.

Also note the use of triggers at the start and end of parsing and searching. This allows application code to listen for these events and provide visual feedback like fading the results list in and out or showing a waiting spinner.

The proxy

There is one last piece of architecture to note. The events API we are using was not designed for direct access from public clients, nor does it support outputting iCal-formatted event data. So we wrote a tiny (55 line) proxy server using Python Flask. Our Fabric-based deployment rig makes deploying a little app like this behind nginx very easy.

That’s it! Our proxy goes out to the API and gets the data, then sends it on back to the client with aggressive caching and gzipping provided by nginx. If iCal-formatted data is needed, the proxy gets the data from the API as json, pumps that data into a template that uses the iCal format, and sends back the appropriate file.


We host the event listing library on our team’s shared assets Amazon S3 bucket. Currently, the library is integrated with a Django application and the main Chicago Tribune CMS. The standalone events pages are published with Tarbell. But hosting central, versioned copies of the library, we can deploy on practically any platform that allows a script tag.

Future optimizations

FullCalendar could use some improvement or an upgrade. It does a tricky job quite admirably, but with a fussy API and some rendering performance issues. The fussy API is acceptable, but a calendar that rendered a little faster and responded to media queries a little better would be a good improvement.

The project depends on a fork of TypeaheadJS to integrate LunrJS that still needs to be submitted back to the TypeaheadJS maintainers.

Data loading could be broken down into successive calls to cut down on initial page load.

The library could benefit from cutting down on repeated indexing. It may be possible to stash the index in local storage, especially because the underlying data source changes fairly infrequently.

Keep it flexible

By emphasizing a strong data API via our collection and keeping view code loosely coupled even at the expense of repeated app code, we were able to create a client side library that is flexible, stable, and easy to deploy. These are patterns to build on.

Written by David Eads

February 17, 2014 at 10:20 pm

Posted in Uncategorized

Announcing the Tarbell 0.9 preview release

leave a comment »

We’ve been hard at work on the next version of our Tarbell project, which makes it easy to build static websites.

Tarbell has been through many iterations over the past year. The first open source release of the library came out in May. This version of Tarbell served us and others well and pointed a way forward.

Five months and many projects later, we are releasing our first preview of the new, entirely overhauled Tarbell. The new release reflects our experience managing dozens of projects and incorporates all the incredibly helpful feedback we’ve received from the community.

The old version of Tarbell has been preserved as v0.8 for posterity. The new version is v0.9.

A few highlights:

  • One line install: Run pip install tarbell on just about any *NIX-based operating system to install.

  • Command line app: The tarbell command line application replaces the hodgepodge of tools previously used by Tarbell.

  • Standalone projects: Instead of maintaining projects and base templates in one big directory, base templates and projects live in their own git repositories.

  • As many base projects as you want: Need templates for long-reads, map projects and data-driven projects? No problem! Tarbell’s new project template system can accommodate as many or as few project templates as you need.

  • Google Drive API for spreadsheet access: The old Google spreadsheets API is going away some day. Tarbell now uses the future-proof Drive API.

  • Faster preview and publishing: Improved performance across the board.

  • Google spreadsheets and Amazon S3 publishing are optional: It is now easy to create a Tarbell project without configuring Google Drive API access or Amazon S3.

To use Tarbell 0.9, head over to the project page at to learn more. We expect a final release of 0.9 some time in the next few weeks as folks find bugs and work with the system while we plan a roadmap for version 1.0.

If you’d like support or to discuss Tarbell, please join our new Google Group.

If you’re going to MozFest (October 25 – 27, 2013), please come to the Tarbell workshop, which will be in the software for journalism conference track. Details to be announced.

Screen Shot 2013-10-17 at 4.14.34 PM

Written by David Eads

October 17, 2013 at 4:05 pm

Posted in Apps, Open Source, Python

Meet Tarbell

with 2 comments

Note: This article was originally published in Source on May 28, 2013.

Tarbell examples

Newsrooms need sturdy, affordable tools to give stories special treatment on a day-to-day basis. If maintenance requirements are compounded every time an app is released, new apps will tend to be big affairs, created by elite developers. But if you can create apps that are simple and built on static HTML, CSS, and JavaScript, you can afford to fire-and-forget and potentially get a few more folks in the newsroom creating online presentations and telling stories on the web.

A couple of years ago, I would have told you the best option was either a powerful but opinionated (and appropriately boring) CMS like Drupal or a “bespoke” (ahem) app with a database and server infrastructure to run it. That was before I started working at the Chicago Tribune, where Brian Boyer opened my eyes to a different ethos.

Kill the server, Brian preached. Bake your apps to Amazon S3. Save money, stop worrying about arcane Linux incantations to keep the database server afloat. As someone who’s spent seven years helping folks pry apart computers to better understand them through my nonprofit FreeGeek Chicago, this scrappy mantra appeals to me.

One of my first projects for the apps team was Playing with Fire, an award-winning investigation into the role of tobacco and chemical companies in shaping policy mandating ineffective and potentially toxic flame retardants in furniture and clothing. Joe Germuska, Brian, and I built a simple rig based on Flask that reads a Google spreadsheet and provides worksheet values to templates as key -> value pairs.

Google spreadsheets for content management? Ohh-kay Brian, says the person who spent the last four years in the Drupal salt mines where Real Websites are backed by a database, web server, and content management system. I was skeptical.

My skepticism, perhaps a bit of techno-elitism, should have been erased when I had talked with some friends in professional management roles (a lawyer, a product executive for a food company, a call center manager) a few years ago about essential skills for their workers. They all said Excel.

I finally understand why spreadsheets are so important. Spreadsheets are a miracle of computing. They’re the Swiss army knife of data management, a programming language for the rest of us. Collaboratively edited spreadsheets are a miracle of the Internet. Managing content with Google spreadsheets lowers the barrier to entry for participation by using familiar, flexible tools. Nothing we could build would provide a better user experience or more seamlessly integrate editors and content creators.

In the end, my fears were unfounded: Using spreadsheets worked! Editors liked the process, and the project came out fabulously.

During the following months we forked the original Flames code to produce Pension Games and An Empty Desk Epidemic. At that point I was getting tired of maintaining forks of the same codebase. Especially because we felt we could start producing these editorial projects on shorter and shorter timelines and get more folks in the newsroom in the mix. So my apps team colleagues and I began work on Tarbell, software meant to distill the lessons of those editorial projects into a single platform.


Tarbell is named in homage to the journalist Ida Tarbell, whose History of the Standard Oil Company is one of the great works of investigative journalism.

Our requirements for Tarbell:

  • Manage dozens of micro-sites with similar branding, analytics, SEO and social media plumbing.
  • Relatively easy to install on OS X and Linux.
  • Develop using simple, modern HTML, CSS, and JavaScript. We believe we can teach our colleagues in the newsroom these skills.
  • Manage content and data with Google spreadsheets: Provide key -> value pairs and arbitrary data to templates (<p>{{ title }}</p>).
  • Easy publish to Amazon S3. Build a common-case publishing workflow that’s easy for end-users to customize to their own needs, like using SFTP instead.

Using Tarbell

You may be asking yourself, “How do I use it?” Here’s a quick overview. For detailed information, see the Tarbell documentation.

First, install Tarbell (you’ll find virtualenv and the virtualenvwrapper helpful as well):

git clone
cd tarbell
mkvirtualenv tarbell
pip install -r requirements.txt

At this point, you’ll need to follow the instructions for creating a client_secrets.json file to allow Tarbell to use your Google account to access Google Drive. If you run into trouble or don’t want Google Drive access, you can configure template variables.

fab newproject

Did the new project process go okay? Now for the fun part!

The fun part

  1. First, fire up a server: python
  2. Visit http://localhost:5000/projectname in your browser to see the new project. You should see a page that looks like the project example page.
  3. Open up your Google doc and add a new key -> value pair in the values worksheet: the key should be “credit” and the value should be “By the News Apps Team”.
  4. Open up myproject/templates/index.html and add the credit line: <p>{{ credit }}</p>.
  5. Open up myproject/static/css/style.css: .credit { font-size: 20px; font-weight: bold; }.
  6. View it in your browser: http://localhost:5000/myproject/index.html or just http://localhost:5000/myproject/.
  7. To deploy, add your S3 credentials to Now you can deploy it: fab deploy. This command will deploy all your Tarbell projects to the production S3 bucket specified in

Tarbell in the World

Tarbell has been used for 18 projects and counting at the Chicago Tribune and is already being used by Southern California Public Radio. Here are some examples of Tarbell projects:

  • Projects pages: Southern California Public Radio is using Tarbell to generate their apps page and have helped Tarbell improve rapidly since our first open source announcement.
  • Election widgets and results: During the last Illinois elections we used Tarbell to manage data and presentation. A small team of reporters entered election result data into the Google spreadsheet. We set up a cron job to republish the Tarbell site every two minutes. Because Tarbell also publishes a JSON representation of the Google spreadsheet, we were able to create a widget that simply polled for new data every couple of minutes. The results page and widget were viewed by more people than voted in the election itself.
  • Long-form storytelling: Tarbell is a great platform for rich, long-form publishing. Hadiya’s Friends and His Saving Grace use custom processors to pull data from the Chicago Tribune CMS and significantly upgrade the online presentation using carefully crafted CSS and JavaScript.
  • Graphics: The Tribune graphics department has used Tarbell (with minimal intervention from the apps team) to develop infographics about the conversion of Chicago’s United Center from ice rink to basketball court (Game Changers), and a deep dive into problems with Boeing’s Dreamliner airplane (Boeing’s Bumpy Ride). These are “just” mobile-friendly translations of print graphics. They are also amongst our most successful and popular projects.
  • Data apps: The apps team is using Tarbell to track school closings and school utilization in Chicago Public Schools using Google Fusion Tables and Google Maps. The school utilization site demonstrates the ease with which other libraries and HTML templates (in this case, Derek Eder’s searchable map template) can be integrated into Tarbell projects.
  • Timelines and galleries: Chicago Gunrunning uses Tarbell to create a series of slides, dramatizing a real case in which guns were moved from a gun show in Indianapolis to Chicago. That project came about after I was approached by a reporter and asked her to put her data into a Google spreadsheet just so I could see. I quickly realized once I saw the data that we could set up a Tarbell project and create a slideshow out of the data. This project became the basis of a new tool to routinely generate these map + text slideshows. We named it TACO after folks in the newsroom kept asking about the “taco tool” used create Kevin Pang’s Fox Valley Taco Tour.

What’s Next?

We had our first official release the other day, version 1.0 beta-1.

During the next two weeks, the Tribune apps team will be deploying the new open source template to people on the graphics desk. We’re planning on a second beta release in mid-June with expanded documentation and a smoother authentication workflow. We hope to have a release candidate or final release by mid-July with more flexible deployment options and better code testing.

We’re happy to hear your thoughts about using and improving Tarbell, so drop us a line at

Written by David Eads

June 7, 2013 at 12:03 pm

Posted in Uncategorized

Announcing the Chicago Crime API

with one comment

We are happy to announce the first release of the Chicago Tribune’s Chicago Crime API, an easy, fast, useful and rich way to access more than 12 years of Chicago crime data. We’re excited to see what you’ll find in this data.

The Tribune Apps Team and the Northwestern Knight Lab are sponsoring a series of hack days to work with crime data and the Chicago Crime API. Come learn more about the API and analyzing crime data in Evanston on April 6 or Pilsen on April 13.

The first released API version is 1.0-beta1. All components of this project are still work in progress. We plan to release several new versions of the API in the next month based on feedback from users on our way to a 1.0 release in late April or early May, 2013.

Diving in

Want to start getting data now? Start by reading the API documentation.

Want to see the data in action? View the demo application or clone it from Github.

Why a crime API?

The City of Chicago hosts this data using Socrata. What makes the Chicago Crime API different?

Easy: The Socrata Open Data API can be fussy and hard to integrate. Our API uses simple query parameters and has thorough documentation.

Fast: We provide cached, summarized data that is quick to access and analyze.

Useful: Our daily summary API endpoint rolls up thousands of rows of crime data into day-by-day counts of all major crime types.

Rich: We provide extended metadata about community areas and crime classifications in the Chicago Crime API. Our API can represent complex data structures and pull in data from sources beyond the City data portal.

Written by David Eads

March 22, 2013 at 9:03 am

Posted in Crime, Uncategorized

Talking about crime: The Chicago Crime site

with one comment

Welcome to what we hope will be an ongoing series of blog posts by members of the apps team about our work analyzing and visualizing data related to public safety and crime. Crime is an important and popular subject. But interpreting crime data is tricky business, and developing coherent narratives and useful metrics is even harder.

Last fall, Heather Billings, David Eads and Joe Germuska built the first version of a comprehensive crime site for the Chicago Tribune called Chicago Crime. Our goal is to provide the best online tools and reporting on crime and public safety for our readers.

We started by building software to load and visualize data from the City of Chicago Data Portal’s crime dataset, which contains crime report data from 2001 to present.

Data model

The most crucial components of the backend are the scraper and data model. The scraper regularly polls the data portal for new records, geocodes each report to neighborhoods and community areas, then writes them to the database. Once the data is imported, we run tools to generate handy summary data, such as daily counts of crime for major crime categories.

Data analysis

The City of Chicago’s crime data includes low-level misdemeanor crimes like fraud, gambling ( Lots has been learned on stopping cyber casino crimes, Fhats Casino has pioneered preventative measures, and fighting, as well as non-criminal reports such as a missing passport (perhaps so that lost passports are reported to the FBI). We filter out these reports to focus on the crimes that are important to our audience and provide a reliable picture of serious crime.

We summarize the data using three primary techniques: Rolling reports up to top level categories, adjusting crime rate for community area population and comparing the current time period to the same time period last year.

Crime categoryViolent and property crimes are those commonly referred to as index crimes; that is, crimes reported to the FBI as part of the Uniform Crime Reporting Program. Specifically, we use the list of Illinois Uniform Crime Report codes to match crime reports to the index crime categories. (The only index crimes not included in statistics for this site are those with the primary description ritualism. Fewer than 25 of these crimes have been reported in the data published by the City going back to 2001.)

Index crime types with a primary description of robbery, battery, assault, homicide or criminal sexual assault are included as violent crimes on this site. Index crime types with a primary description of theft, burglary, motor vehicle theft or arson are counted as property crimes. Additionally, index crimes with the primary description offense involving children with secondary descriptions including sexual assault are counted for this site as violent crime: sexual assault.

While using residential population to compare crime rates by geographic area has downsides, we believe it provides a much more realistic picture of crime in Chicago than absolute counts.

The biggest downside of this approach: The daytime population in some neighborhoods fluctuates significantly, which means numbers for places such as the Loop might be inflated because the residential population is significantly less than the population of tourists and workers.

Finally, we look at the change in each top level category compared to the same period last year. Year-over-year comparisons suffer from a lack of long-term historical perspective, but are less prone to bias from changes to the law, enforcement policy and reporting procedures. Without a reliable historical “average,” we decided to compare the current time period only to the past year.

Data challenges

We are fortunate to have all index crime reports published to the city data portal in a consistent format with a workable data API. But all real-world data is messy, especially when it involves multiple agencies, legacy data systems and complex legal requirements. We encountered many challenges:

  • The time to reach the data portal varies from report to report, reports are occasionally deleted and reports are upgraded or downgraded. For example, an assault may be “upgraded” to a homicide if the victim later dies. Late in 2012, Chicago’s 500th homicide was reclassified several times before being finally classified as a homicide.
  • Several potentially useful fields turned out to be less reliable than we had hoped:
    • The “updated date” field, which we hoped would help us query for recently added or updated records, is an artifact of another system and only partially useful.
    • The “arrest” boolean field is only true if an arrest was made when the initial report was created. This field is not updated if an arrest is made later, and we’ve heard it may not be accurate in cases where the reporting officer does not make an arrest but another officer does.
  • In other projects, we found some reports include data entry errors, such as a report about a crime at a Red Line stop that was geocoded to Navy Pier.
  • Inter-agency delays: The Chicago Police Department sometimes lags the Medical Examiner in declaring a homicide, so there’s often a discrepancy between sites like the RedEye Homicide Tracker and the portal data.
  • Records lack useful details. While every homicide gets an individual report, a shooting in which five people were wounded may be a single report.
  • We’ve struggled to connect police reports to a crime’s dispensation in the court system if an arrest is made.

To address the fluctuating data set, we run a weekly data audit to sync with the data portal: Reports that have been expunged are moved to a backup database and new reports with old updated dates are harvested.

We recently were able to strike one data challenge off the list when the city started providing a reliable field with community area number for each report.

We’ll address these challenges in depth in a post and expanded page on Chicago Crime about how to interpret the crime data.

Design challenges

By consulting with editors and reporters, we identified a few core ideas to focus on. Looking back, these morphed into key user interface challenges:

  • Making lots of complex information understandable to the many different cognitive styles of our audience: visually, geospatially, numbers- and stats-oriented, narrative oriented.
  • Organizing information so that people could dig in if they wanted to, but wouldn’t be overwhelmed if they didn’t.
  • Developing useful, reliable metrics for comparing community areas to each other and to historical trends.
  • Presenting numbers in context.
  • Designing a site that avoids meaning-laden colors and design elements.
  • Building a decently responsive site, with room to grow and experiment with mobile.

We knew we wanted to make use of the 11 years of data we had from the Chicago Police Department, yet we also wanted to create something that would be easy to interpret at first blush. When you break this much data down into something easily digestible, you run the risk of pulling numbers out of context. We struggled with how to synthesize what we call the “big numbers” without making areas seem more or less crime-laden than they actually were.

We also had two equally important elements for the top of the page: the map with locations of crime in the last month, and the “big numbers” breakdown. We wanted both to be immediately visible. They inform one another, and appeal to both people who think visually or are interested in spatial patterns, and people who think alphanumerically and are interested in trends and numbers.

Screen shot 2013-02-28 at 2.20.25 PM

Another main element on the page was the travelling navigation. Because there were so many different breakdowns of the data, a travelling navbar seemed like the best option. In contrast to the map and big numbers section, this seemed fairly straightforward.

We cribbed some code for horizontal sticky navigation from the schools application. That way, the navigation would be out of the way yet easy to access.


To check our assumptions, we did something we rarely have the time or resources to attempt: user testing.

User testing is a pain, especially on a deadline. It’s hard to find people who are willing to take several hours from their workdays to be stared at while poking around at a website. It takes significant team effort to organize the space, the time and the coffee. And it takes your time to conduct the interview and the discussions that follow. In all, it took us somewhere around 25 staff hours to pull off a five-person test.

Happily, this is exactly the sort of interruption that needs to happen in the mad rush to build something. Sometimes you need to take a break from hiking through the woods to remind yourself what the map looks like. Our goal was to make this information easy to understand, and the more we wrestled with the details, the less sure we were that we were really achieving that goal.

Each tester brought radically different perspectives to the table. The diversity of our testers seemed to help compensate for our small sample size. A statistical turn of phrase makes sense to a highly educated user but might not serve someone younger or without much education. Someone looking to buy a house or open a business might want summary numbers; someone invested in her community or her block will want specificity. A car owner probably wants details on property crimes like vehicle theft; a commuter wants to know what’s happening at CTA platforms.

Perhaps more interesting than the users’ varied, contradictory opinions were the opinions they shared — especially as to what did not work. The travelling navigation that we thought was so slick and obvious was overlooked by everyone who looked at the site. It was on the side of the page, but they were looking at the top.

“There’s a lot on this page,” said one tester. He scrolled around. The little navbar followed him down the page, securely in his peripheral vision and out of his way. “I wish I had something to tell me what was down the page. Like some buttons at the top or something.”

Based on this input, we changed our navigation design, which also helped let us always show the name of the community area being viewed.


While some testers had interests in features beyond the scope of our project, we were able to identify several weak points in our interface that could be fixed without major effort or woven into in-development features. Despite the effort, user testing was a big win, and significantly improved the fit-and-finish of the project.

Software architecture

Chicago Crime is built on a foundation of GeoDjango and PostGIS. The backend software regularly scrapes, massages and summarizes the data on Chicago Data Portal using the Socrata Open Data API.

The frontend uses Django plus a client stack that includes a fork of Backbone that replaces default styles to mimic, jQuery, Underscore, Leaflet, Tablechart + jqPlot. We used Tilemill to generate a map of Chicago Community Areas for our all-city ranking.

It all runs on a basic application stack on Amazon EC2 using Apache and mod_wsgi. As always, we ensure performance with aggressive Varnish caching.

The site uses responsive design to provide a decent experience on phones and tablets. The emphasis on responsiveness gives us room to grow and experiment with mobile.

What’s next?

The next phase of the site will focus on engaging with a broader community of crime data nerds. We will soon release an API for the site’s summary data as well as report-level data. We are partnering with the Northwestern University Knight Lab to hold a series of hack-a-thons in March and April 2013 to build interest and awareness around Chicago crime data. We also plan to roll out a new version of the site with better mobile performance, more data sources and deeper analysis. Stay tuned!

Written by David Eads

February 28, 2013 at 2:06 pm

Posted in Crime, Uncategorized

School Report Card update

with one comment

Last week we added crime report data to the Schools Report Card application. The data was obtained from the Illinois State Board of Education via a Freedom of Information Act request by Joel Hood and Diane Rado. As it often happens, adding this data was not simply a matter of importing some rows from a spreadsheet. As Hood and Rado reported, only about 40% of Illinois school districts have ever used the Board of Education’s reporting system, and even less have consistently used the system to report crime.

Because so much data is missing, we considered not publishing the information at all. The inconsistent reporting and lack of garcinia cambogia extract supplementation makes it impossible to compare schools in different districts. And we were concerned about making schools in compliance look bad in comparison to schools that simply don’t report. Searching for crime reports at school properties on the Chicago data portal demonstrates that an absence of reported crime in the state’s system does not mean an absence of crime.

Ultimately, we decided that the data should be presented with context for understanding the presence or absence of data. These crime reports, when available, have value for parents, students, and policy-makers.

We show the crime report section for each school even when no data is available along with text that explains the crime reporting system. For example:

City of Chicago SD 299 has not reported any crimes at Von Steuben Metro Science High School under the Illinois State Board of Education’s Student Incident Reporting System since it debuted in 2006. Criminal offenses may have occurred on school grounds without the data being reported to the state as required by law.

We plan to request fresh school crime reporting data periodically, and hope that schools will better report crime incidents going forward.

Written by David Eads

June 20, 2012 at 4:49 pm

Posted in Apps