Archive for the ‘Crime’ Category
Ever since David Eads, Joe Germuska and I launched the Chicago Tribune crime site more than a year ago, we’ve wanted to revisit it. Things that we thought we’d be able to return to shortly after launch ended up sitting unaddressed for months. New pages got shoehorned into a design that was never meant to accommodate them. Finally, at the end of last year, my colleagues (notably Andy Boyle and Mr. Eads) and I got a chance to take a crack at revising the massive app. I’m pretty happy with how it ended up, and I thought it would be helpful to share some of the things I learned along the way.
I designed the site originally before I’d heard of this concept of “mobile-first design,” and, boy, did it show. On community area pages, the order of information was weird, the map just disappeared, and the performance was terrible.
Mobile-first design has been derided as a method that over-simplifies for desktop, forcing one small bit of information at a time upon a user who has screen real estate to spare. It’s true that a columnar, white-space-driven adapts easily to mobile. Sometimes, that’s just the ticket. But for a site like this, dense with numbers, graphs and data, that is not a viable layout. So we dropped the multi-column layout from the community area page in favor of a more focused experience.
The prior site’s multi-column layout resulted in an odd informational hierarchy, with demographic information and community area stories superseding nearly everything else.
For the new site, there was a lot of careful planning about what needed to be placed high on the page at every breakpoint. I laid out columns-within-columns to allow those sections to break cleanly. For instance, each crime trends section is a column that contains three columns: a time-of-day-graph, a type table and a location table. On phones, these collapse nicely, one on top of another, in a logical order.
Responsive graphs are hard. These still need a bit of work (the interaction is a bit tough on phones and tablets), but moving to D3 and Backbone.js from jqplot helped immensely. David came up with a brilliant solution for the historical chart: display only as many prior years of data as there is screen real estate for. This ensures that the chart is legible even on small screens.
The crime site expanded after launch to include pages about shootings and homicides in the city as a whole. We hadn’t designed the site to be very flexible, anticipating that we would only have community area-level pages. As a result, the shootings and homicides pages ended up looking a bit disjointed. Accessing them from the rest of the site was difficult. They felt orphaned.
The new site was designed to be flexible. Indeed, part of the redesign’s goal was to incorporate a large amount of data about crime in Chicagoland suburbs. It was clear that good navigation could solve the orphaned-page issue and allow for future expansion of the site.
So all I have to do is find a way to logically display more than 350 links? I thought to myself. No problem! Oh, and a search bar? And branding?
Two days later, I’d found a style that allowed me to display all 77 community areas easily. The city-wide pages are nestled in the same dropdown. The suburban pages required a little more work. I did misguidedly try to figure out how to display all 250+ cities in the dropdown (try making THAT mobile-friendly), but ended up displaying just the top 20% most populous towns with a link to the full list. The search box got its own dropdown — not ideal, but when faced with needing to make space for the Trib’s logo, it seemed like the best way to save space.
Our project manager, Kaitlen, originated the term “crime confetti,” because the community area map sported so many colorful dots. The colors on the site have always been a bit disproportionately bright and cheery for their subject matter, so Alex Bordens and I sat down to try to come up with better colors. After about a day of experimentation, we realized that the original site’s colors, funky though they are, solved a hard problem: They worked for colorblind users, didn’t imply a hierarchy and didn’t conflict when in proximity to each other. Coming up with three other colors that worked just as well in each of these situations proved a Really Difficult Problem, so we eventually tweaked them a bit and called it good.
All in all, the site functions so much better on devices now than it did, and it’s cleaner and more user-friendly across the board. Stay tuned as we use our new-found flexible design to add more analysis to the site!
My internship with the Chicago Tribune News Apps team made me realize that 1) coffee is awesome and 2) what I want to do is actually possible. As soon as I discovered that there was such a thing as “news applications,” I knew I had to be a part of it. Not only does the field combine my majors (journalism and computer science) perfectly, its very existence made it possible for me to spend my summer working in such a unique, exciting environment.
The experience was unlike any other internship (or class or job) I’ve ever had: In fact, it was the most fun I’d ever had while sitting still. The team really wanted to make sure that I was not only having fun, but also learning really valuable skills. There was never a time that I didn’t have something useful to do, and there was certainly never a time that I didn’t want to do the work I’d been given.
One project that I worked on was a special multimedia report on violent crime in the city entitled Chicago Under the Gun. This project gave me insight into both the News Apps process and the editorial process and how the latter applies to web developers as well as journalists. We used Tarbell for the project, a content management system developed for the newsroom by the News Apps team. Tarbell allows for a huge amount of freedom in creating and designing a page while giving journalists and editors enough structure that they don’t have to mess with the code if they don’t want to.
Photo editor Erin Mystkowski wrote the initial HTML and CSS that made up the base of the page and then I took over some of the finer points, building components that Erin wanted to include but was unable to build on her own. Not that I had all the answers, either—I spent a lot of time doing online tutorials and research to figure out the best way to solve problems, and of course receiving guidance from the News Apps team, particularly my unofficial mentor, David Eads. He threw me into the deep end a few times, but only so I could learn valuable problem-solving skills (and was ready to fish me out again when necessary!).
Working on Chicago Under the Gun really drove home the idea that no one codes/writes/designs in a bubble. There must be collaboration and compromise throughout the whole process, which is often what makes the end result such a success. From this project, I also learned what a rewarding experience it is to be involved in something that could actually influence people’s opinions and inform them about their community. This project was probably the single most exciting and influential thing that I’ve ever done.
This week we launched Chicago shooting victims, which tracks where and when people are shot in Chicago, a city that’s seen more than 1,000 people shot in the first six months of 2013. This project coincided with a detailed story that ran in the Chicago Tribune written by the very reporters gathering the data.
First, let me explain what the data represent and where these numbers come from. In Chicago, if six people are shot at one time, the police record the data as just one shooting. For instance, if six people are shot during a drive-by shooting, then it counts as one shooting, even if it has six victims. Our Chicago Breaking News desk started keeping track of every victim shot in late 2011 because they wanted to have a complete record of how many people were struck by gunfire, because the city doesn’t provide that data.
When a shooting occurs, Tribune reporters track down as much information as possible about the victim — name, age, gender, where the shooting happened, which hospital they went to, etc. — and they enter it into a Google spreadsheet. But in many cases, all reporters can discover is that someone, of some age and some gender was shot at a specific location.
With about a week to go before heading to print, Alex Bordens, Ryan Mark and I set to work turning the spreadsheet into usable data we could visualize. First we geocoded and cleaned the data then loaded it into a database. I’ll be covering that in this blog post. The next one will focus on displaying the data in a map, charting the data and displaying a recent list of shootings.
Geocoding the data
We have a Bing Maps API Key, and I have a basic Django management command to pull data from a Google doc, attempt to geocode an address and then save it into a PostgreSQL database. (Oh, and a quick FYI, my colleague Ryan Nagle wrote a geopy reverse geocoder for Bing. Here’s the pull request. And here’s an example of a similar Django management command.)
One of the first problems we encountered – which will happen when you’re trying to geocode 4,000+ location points – was data duplication. The data points are sometimes so similar that you can’t just check for unique attributes as a way to check for duplication. Sometimes multiple victims were shot at the same location, on the same day. But the only recorded information in the spreadsheet could be just their gender, typically male.
So we created a new row and made unique IDs, with the year first and a number after, such as 2013-789. We then communicated with the Breaking News desk staff so they could include the unique ID in their workflow. The unique ID also allows us to check for any updates to existing entries in the database, which I will discuss later.
Next, we discovered some locations just don’t geocode that well, sometimes because addresses weren’t specific enough or because Bing was having trouble. If Bing can’t find a specific location, which happened a lot on any street with “Martin Luther King” in the name, it just returns a latitude and longitude in the northwest part of the Loop. And some locations only described where a highway and street intersect. So I added a boolean field into the database that defaults to “True” but returns “False” if either of two things happen: Bing geocodes it to that latitude and longitude in the northwest part of the Loop (thus proving Bing couldn’t find it) or did it returned nothing. Otherwise, it stays “True.”
We were also able to use the boolean field to keep the incorrectly geocoded locations off our map and out of the neighborhood level data. Any locations we couldn’t geocode were left off the map, counted and included in a disclaimer at the bottom of the map. More transparency about the data is always better for us, so it was important to include that.
After geocoding all of the addresses, we were able to identify which ones weren’t found using the boolean field. We could then find the latitude and longitude by hand. We did this because we wanted to be able to compare neighborhoods, so being off by a few yards was mostly close enough for this.
So now we had latitude and longitude for each record. First I thought about importing them into the database by hand, but that seemed silly, as I knew we may need to reimport the data again, and that meant these same locations would get screwed up over and over. Therefore we added a latitude and longitude field at the end of the spreadsheet, and input the coordinates for the 70+ addresses that failed to geocode correctly. We can continue to use this technique for future bad addresses, too.
Cleaning and importing the data
The data had more than 4,000 rows, entered by humans, who, despite doing their best, occasionally entered typos or wrote ages as “40ish” instead of an integer. Anyone who’s dealt with any large dataset made by humans can attest this is pretty normal. So I had to write many functions to clean every field of data we were importing. We also wrote a function that checks if any data has changed for each record and updates the database accordingly.
We build our projects using virtual environments locally, and then test them on staging and production servers with the same virtual environments. This way we know what software is installed on each server and how everything should interact, which leads to (hopefully) fewer errors in our code.
For this project, I geocoded everything using the Django management command on my local machine. Once the geocoder ran perfectly (sometimes it would hiccup more than our tests allowed and break partway through), I made a fixture, exported the data into json and committed it to our git repository, so we could easily load the data.
We were also able to turn this json file into a csv using csvkit with this simple command:
in2csv -f json path/to/filename.json > path/to/filename.csv
We used the csv to create the static maps that appeared in the newspaper.
Making the map and chart
The map is powered by Leaflet.js, Stamen Design map tiles and OpenStreetMap data. We get and build community area shapes as json from our Crime app. We calculate how many shooting victims each community area has on the backend and send it as json to the page where it calculates the shading of each community area. We also pass the data for the current year and previous year in json, which is added to the pop-up that is generated in a template. The individual shootings are passed with their latitudes and longitudes as an array, which Leaflet then uses to draw a circle around.
The chart is built with Rickshaw, using more json we pass to the page from the backend. We initially tried using D3.js, but it was kind of complicated. Rickshaw was used for the graphs on our Homicides page, so we already knew what it could do.
We just used basic Django templating to send the last 30 days worth of shootings to the page. We originally had listed homicide information, showing which shootings were fatal, but our numbers don’t always jive with our Homicides page, as sometimes folks die days or weeks later from shootings and it sometimes isn’t updated in our spreadsheet.
Lastly, we styled it. Thankfully, we’ve rolled our own version of Twitter Bootstrap and created our own style guide, which makes it much easier to start projects. Having base templates like this ready to go makes it easy to focus on the engineering and mechanics of any new project, and I would strongly recommend investing time to having base styles that you can use on projects that match the look and feel of your website.
We are happy to announce the first release of the Chicago Tribune’s Chicago Crime API, an easy, fast, useful and rich way to access more than 12 years of Chicago crime data. We’re excited to see what you’ll find in this data.
The Tribune Apps Team and the Northwestern Knight Lab are sponsoring a series of hack days to work with crime data and the Chicago Crime API. Come learn more about the API and analyzing crime data in Evanston on April 6 or Pilsen on April 13.
The first released API version is 1.0-beta1. All components of this project are still work in progress. We plan to release several new versions of the API in the next month based on feedback from users on our way to a 1.0 release in late April or early May, 2013.
Want to start getting data now? Start by reading the API documentation.
Why a crime API?
The City of Chicago hosts this data using Socrata. What makes the Chicago Crime API different?
Easy: The Socrata Open Data API can be fussy and hard to integrate. Our API uses simple query parameters and has thorough documentation.
Fast: We provide cached, summarized data that is quick to access and analyze.
Useful: Our daily summary API endpoint rolls up thousands of rows of crime data into day-by-day counts of all major crime types.
Rich: We provide extended metadata about community areas and crime classifications in the Chicago Crime API. Our API can represent complex data structures and pull in data from sources beyond the City data portal.
Welcome to what we hope will be an ongoing series of blog posts by members of the apps team about our work analyzing and visualizing data related to public safety and crime. Crime is an important and popular subject. But interpreting crime data is tricky business, and developing coherent narratives and useful metrics is even harder.
Last fall, Heather Billings, David Eads and Joe Germuska built the first version of a comprehensive crime site for the Chicago Tribune called Chicago Crime. Our goal is to provide the best online tools and reporting on crime and public safety for our readers.
We started by building software to load and visualize data from the City of Chicago Data Portal’s crime dataset, which contains crime report data from 2001 to present.
The most crucial components of the backend are the scraper and data model. The scraper regularly polls the data portal for new records, geocodes each report to neighborhoods and community areas, then writes them to the database. Once the data is imported, we run tools to generate handy summary data, such as daily counts of crime for major crime categories.
The City of Chicago’s crime data includes low-level misdemeanor crimes like fraud, gambling ( Lots has been learned on stopping cyber casino crimes, Fhats Casino has pioneered preventative measures, and fighting, as well as non-criminal reports such as a missing passport (perhaps so that lost passports are reported to the FBI). We filter out these reports to focus on the crimes that are important to our audience and provide a reliable picture of serious crime.
We summarize the data using three primary techniques: Rolling reports up to top level categories, adjusting crime rate for community area population and comparing the current time period to the same time period last year.
Violent and property crimes are those commonly referred to as index crimes; that is, crimes reported to the FBI as part of the Uniform Crime Reporting Program. Specifically, we use the list of Illinois Uniform Crime Report codes to match crime reports to the index crime categories. (The only index crimes not included in statistics for this site are those with the primary description ritualism. Fewer than 25 of these crimes have been reported in the data published by the City going back to 2001.)
Index crime types with a primary description of robbery, battery, assault, homicide or criminal sexual assault are included as violent crimes on this site. Index crime types with a primary description of theft, burglary, motor vehicle theft or arson are counted as property crimes. Additionally, index crimes with the primary description offense involving children with secondary descriptions including sexual assault are counted for this site as violent crime: sexual assault.
While using residential population to compare crime rates by geographic area has downsides, we believe it provides a much more realistic picture of crime in Chicago than absolute counts.
The biggest downside of this approach: The daytime population in some neighborhoods fluctuates significantly, which means numbers for places such as the Loop might be inflated because the residential population is significantly less than the population of tourists and workers.
Finally, we look at the change in each top level category compared to the same period last year. Year-over-year comparisons suffer from a lack of long-term historical perspective, but are less prone to bias from changes to the law, enforcement policy and reporting procedures. Without a reliable historical “average,” we decided to compare the current time period only to the past year.
We are fortunate to have all index crime reports published to the city data portal in a consistent format with a workable data API. But all real-world data is messy, especially when it involves multiple agencies, legacy data systems and complex legal requirements. We encountered many challenges:
- The time to reach the data portal varies from report to report, reports are occasionally deleted and reports are upgraded or downgraded. For example, an assault may be “upgraded” to a homicide if the victim later dies. Late in 2012, Chicago’s 500th homicide was reclassified several times before being finally classified as a homicide.
- Several potentially useful fields turned out to be less reliable than we had hoped:
- The “updated date” field, which we hoped would help us query for recently added or updated records, is an artifact of another system and only partially useful.
- The “arrest” boolean field is only true if an arrest was made when the initial report was created. This field is not updated if an arrest is made later, and we’ve heard it may not be accurate in cases where the reporting officer does not make an arrest but another officer does.
- In other projects, we found some reports include data entry errors, such as a report about a crime at a Red Line stop that was geocoded to Navy Pier.
- Inter-agency delays: The Chicago Police Department sometimes lags the Medical Examiner in declaring a homicide, so there’s often a discrepancy between sites like the RedEye Homicide Tracker and the portal data.
- Records lack useful details. While every homicide gets an individual report, a shooting in which five people were wounded may be a single report.
- We’ve struggled to connect police reports to a crime’s dispensation in the court system if an arrest is made.
To address the fluctuating data set, we run a weekly data audit to sync with the data portal: Reports that have been expunged are moved to a backup database and new reports with old updated dates are harvested.
We recently were able to strike one data challenge off the list when the city started providing a reliable field with community area number for each report.
We’ll address these challenges in depth in a post and expanded page on Chicago Crime about how to interpret the crime data.
By consulting with editors and reporters, we identified a few core ideas to focus on. Looking back, these morphed into key user interface challenges:
- Making lots of complex information understandable to the many different cognitive styles of our audience: visually, geospatially, numbers- and stats-oriented, narrative oriented.
- Organizing information so that people could dig in if they wanted to, but wouldn’t be overwhelmed if they didn’t.
- Developing useful, reliable metrics for comparing community areas to each other and to historical trends.
- Presenting numbers in context.
- Designing a site that avoids meaning-laden colors and design elements.
- Building a decently responsive site, with room to grow and experiment with mobile.
We knew we wanted to make use of the 11 years of data we had from the Chicago Police Department, yet we also wanted to create something that would be easy to interpret at first blush. When you break this much data down into something easily digestible, you run the risk of pulling numbers out of context. We struggled with how to synthesize what we call the “big numbers” without making areas seem more or less crime-laden than they actually were.
We also had two equally important elements for the top of the page: the map with locations of crime in the last month, and the “big numbers” breakdown. We wanted both to be immediately visible. They inform one another, and appeal to both people who think visually or are interested in spatial patterns, and people who think alphanumerically and are interested in trends and numbers.
Another main element on the page was the travelling navigation. Because there were so many different breakdowns of the data, a travelling navbar seemed like the best option. In contrast to the map and big numbers section, this seemed fairly straightforward.
We cribbed some code for horizontal sticky navigation from the schools application. That way, the navigation would be out of the way yet easy to access.
To check our assumptions, we did something we rarely have the time or resources to attempt: user testing.
User testing is a pain, especially on a deadline. It’s hard to find people who are willing to take several hours from their workdays to be stared at while poking around at a website. It takes significant team effort to organize the space, the time and the coffee. And it takes your time to conduct the interview and the discussions that follow. In all, it took us somewhere around 25 staff hours to pull off a five-person test.
Happily, this is exactly the sort of interruption that needs to happen in the mad rush to build something. Sometimes you need to take a break from hiking through the woods to remind yourself what the map looks like. Our goal was to make this information easy to understand, and the more we wrestled with the details, the less sure we were that we were really achieving that goal.
Each tester brought radically different perspectives to the table. The diversity of our testers seemed to help compensate for our small sample size. A statistical turn of phrase makes sense to a highly educated user but might not serve someone younger or without much education. Someone looking to buy a house or open a business might want summary numbers; someone invested in her community or her block will want specificity. A car owner probably wants details on property crimes like vehicle theft; a commuter wants to know what’s happening at CTA platforms.
Perhaps more interesting than the users’ varied, contradictory opinions were the opinions they shared — especially as to what did not work. The travelling navigation that we thought was so slick and obvious was overlooked by everyone who looked at the site. It was on the side of the page, but they were looking at the top.
“There’s a lot on this page,” said one tester. He scrolled around. The little navbar followed him down the page, securely in his peripheral vision and out of his way. “I wish I had something to tell me what was down the page. Like some buttons at the top or something.”
Based on this input, we changed our navigation design, which also helped let us always show the name of the community area being viewed.
While some testers had interests in features beyond the scope of our project, we were able to identify several weak points in our interface that could be fixed without major effort or woven into in-development features. Despite the effort, user testing was a big win, and significantly improved the fit-and-finish of the project.
Chicago Crime is built on a foundation of GeoDjango and PostGIS. The backend software regularly scrapes, massages and summarizes the data on Chicago Data Portal using the Socrata Open Data API.
The frontend uses Django plus a client stack that includes a fork of Backbone that replaces default styles to mimic chicagotribune.com, jQuery, Underscore, Leaflet, Tablechart + jqPlot. We used Tilemill to generate a map of Chicago Community Areas for our all-city ranking.
It all runs on a basic application stack on Amazon EC2 using Apache and mod_wsgi. As always, we ensure performance with aggressive Varnish caching.
The site uses responsive design to provide a decent experience on phones and tablets. The emphasis on responsiveness gives us room to grow and experiment with mobile.
The next phase of the site will focus on engaging with a broader community of crime data nerds. We will soon release an API for the site’s summary data as well as report-level data. We are partnering with the Northwestern University Knight Lab to hold a series of hack-a-thons in March and April 2013 to build interest and awareness around Chicago crime data. We also plan to roll out a new version of the site with better mobile performance, more data sources and deeper analysis. Stay tuned!