Tribune DataViz

Matters of interest, from the data reporters and developers across Tribune Publishing

Better web cartography with dot density maps and new tools

with 19 comments

Between Brian, Joe, and myself there hasn’t been a time in the last six months where at least one of us wasn’t working with census data.

Back in February we attacked the less-detailed (redistricting) data for print and the web. In April, May, and June we contributed to a joint effort with an esteemed cadre of news nerds to develop, a site intended to make it easier for journalists to report from census data. And to prepare for this recent release, we even spent a week hacking near-complete prototype maps using data that the census had already released, Kings County, New York.

We learned hard lessons about the scale and nuance of the census in the last few months, and along the way, further built out our toolkit for making maps. Last week the Census Bureau released detailed (summary file) data for Illinois, and we used our new tools to produce a couple of maps we’re pretty excited about:

These maps demonstrate a map style we haven’t attempted before: dot density mapping. Dot maps let us represent multi-variate data more richly than choropleth maps–for example, illustrating variation in race and population density simultaneously. We were inspired in this effort by Bill Rankin’s Radical Cartography project and Dennis McClendon’s race map for the Encyclopedia of Chicago.

Many of the tools needed to create the maps we wanted didn’t exist. Using the fantastic TileMill as our starting point, we began to build a toolkit.


Invar automates the generation of map tiles, and the deployment of tiles to S3. It is the first and least glamorous of the tools we created, but crucially, it’s very, very fast. Fast!

The first time we ever tried to create our own tileset, it took hours to render and twice as long to deploy. Thanks to invar’s parallelizing these tasks, we can now produce a map in minutes and deploy it just as fast. In fact, we now deploy our maps to four separate S3 buckets so that we can take advantage of Leaflet‘s support for round-robining tile requests to multiple subdomains. Fast!


Next we needed to distribute dots across geographies. We found one implementation of dot distribution in Python, which we extended into a module for reuse.

Englewood (named after an ailing Chicago neighborhood that the newspaper writes many sad stories about) uses the Python bindings for GDAL to load data from PostGIS or shapefile. It scatters points within each feature and then writes the points out to a table or new shapefile.

A small snippet of Python is required to configure Englewood. The following code renders the dots for our map of children less than five from a database. (A demo using shapefiles can be found in the repository):

#!/usr/bin/env python

from englewood import DotDensityPlotter 

def get_data(feature):
    This function is called for each feature Englewood processes and needs to return a
    dictionary of classes, with a number assigned to each. Englewood will divide this
    number by a "dots_per" value set below and create that many dots for that class
    within the geography.
    return {
        'hispanic': feature.GetFieldAsInteger(feature.GetFieldIndex('hispanic_under5')),
        'black': feature.GetFieldAsInteger(feature.GetFieldIndex('black_under5')),
        'asian': feature.GetFieldAsInteger(feature.GetFieldIndex('asian_under5')),
        'nhwhite': feature.GetFieldAsInteger(feature.GetFieldIndex('nhwhite_under5'))

# Example argument values passed into the DotDensityPlotter
# In this case features are read from a PostGIS table (under_5_by_race_blocks_shapes)...
source = 'PG:dbname=chicagocensus host=localhost'
source_layer = 'under_5_by_race_blocks_shapes'
# ...and written into another PostGIS table (under_five_dots)
dest_driver = 'PostgreSQL'
dest = 'PG:dbname=chicagocensus host=localhost'
dest_layer = 'under_five_dots'
get_data_callback = get_data
dots_per = 1

dots = DotDensityPlotter(source, source_layer, dest_driver, dest, dest_layer, get_data_callback, dots_per)


A fast and stable process is useless if you can’t repeat it. We’ve built out a fabric configuration which allows us to make these maps in the quickest and most efficient way possible. Among other things, it allows us to keep some configuration (such as a bounding box) in a per-map YAML file. It parses this file and handles passing the correct arguments to invar for rendering and deployment. Perhaps most exciting, if you’re using the new TileMill 0.4 (available for OSX or Ubuntu) it can completely automate the production of Wax interactivity grids, such as we used to do the highlighting in our recent maps.

Via Crayonsman (CC BY-SA 3.0)

Styling dots

Creating dot density maps created new challenges with regards to styling. Brian tried numerous approaches to color and size the dots, but ultimately we settled on a few principles that worked pretty well:

  • Use a dark, sparse base-layer (we used a custom-styled Google Maps layer, but would like to move to an Open Street Map base-layer in the future).
  • Make your dots to stand out brightly. Try the fluorescent colors from the palette of Crayola crayons.
  • Play with transparency–you may want to take advantage of the effect of overlapping transparent dots.
  • Make Dots scale on zoom.
  • Whenever possible, use one dot per individual. It’ll make for a more interesting map.

Here is the style we settled on:

#under-five {
  marker-allow-overlap: true;
  [group="asian"] {marker-fill:#FF496C;}
  [group="black"] {marker-fill:#76FF7A;}
  [group="hispanic"] {marker-fill:#FFCF48;}
  [group="nhwhite"] {marker-fill:#7366BD;}
  [zoom=9] {marker-height:.2;}
  [zoom=10] {marker-height:.3;}
  [zoom=11] {marker-height:.5; marker-opacity:.7;}
  [zoom=12] {marker-height:.8; marker-opacity:.7;}
  [zoom=13] {marker-height:1; marker-opacity:.8;}
  [zoom=14] {marker-height:1.5; marker-opacity:.8;}

Wrapping up

Although I’ve linked to a number of projects and code snippets in this post, you may find it useful to see a complete project. This week, with Illinois under our belt, I decided to apply the same methodology to my side-project, Hack Tyler. I produced a map of race in Smith County, Texas (related blog post). Part of Hack Tyler’s modus operandi is developing in a completely transparent manner. As a result, you can see complete examples of both our backend and client-side mapping rigs in the following projects:

We hope that we’ve pressed the envelope a bit with these new maps. Someone said that this was the year cartographers retake the internet. I hope that’s true. Its about time that online maps were more than just shading boxes.


Written by Christopher Groskopf

August 12, 2011 at 4:02 pm

19 Responses

Subscribe to comments with RSS.

  1. Instead of writing to 4 separate S3 buckets, assign your S3 domain name 4 CNAME records (,, etc.) for the tile request parallelization trick, and save a bit of $.

    Paul Smith (@paulsmith)

    August 12, 2011 at 4:17 pm

    • Paul, to the best of my understanding this is, in fact, not possible. S3 buckets will only respond to requests if the CNAME in question exactly matches the name of the buckets. I first tried to do exactly what your suggesting, but some Googling turned up several examples of other folks who had also tried and failed to make it work. Obviously, the “deploy four times” solution is not ideal, but it is a working solution until Amazon decides to revisit this.

      Christopher Groskopf

      August 12, 2011 at 8:42 pm

      • Hey Chris,

        That’s true of S3 buckets, but CloudFront distributions support multiple CNAMEs. We were using this extensively back when we used S3 as tile storage and it worked well – the cost of CloudFront is relatively small even for big traffic loads – that’s relative to the cost of running rendering servers and doing millions of PUT requests. Of course, I’d recommend you guys jump the individual-files ship and use MBTiles to alleviate that pain, but that’s a different point :)

        Tom MacWright (@tmcw)

        August 26, 2011 at 11:04 am

  2. Another killer project from TribApps. The placement and coloring of the dots provides multiple dimensions of data in an interesting way. And I appreciate the contribution you’re making to the news community with a solid writeup (and releasing code!) I’m curious though how the communication/workflow between you digital folks and the writers works. I assume that you were knee deep in the data, noticed trends and suggested someone write about it? Keep up the good work!

    • Kenton, thanks! We don’t have what a formal process for how we interact with writers, but in this case it went something like this:

      1. When the SF1 timetable was announced we got together with reporters and discussed what would be in the release and speculated about what stories might come out of it.
      2. A week before the release we revisited our list and came up with a roughly prioritized list of five to six story ideas.
      3. When the data landed we quickly vetted out the top two or three of those ideas and reported back to the writers about where we did and did not see trends (we are continuing this process now with the less urgent leads).
      4. We came to a working consensus on which story to try to tell for day one.
      5. The reporters contacted demographers, got interviews, and wrote the story while we sprinted to get a map ready. We went back and forth as we spotted interesting examples and they raised new questions.
      6. Repeat steps 3-5 for day two.


      Christopher Groskopf

      August 12, 2011 at 8:51 pm

  3. Amazing work

  4. Fantastic post and a great app! It’s always a pleasure to see other developers leverage my blog posts and take it further. However this reference was particularly fun. I have a degree in Print Journalism and started out in the newspaper business. I backed into the geospatial industry 11 years ago and still love it but still fondly remember my newsroom website days. It’s great to see a newspaper team demonstrate compelling ways for newspapers to embrace the latest technology.

    Joel Lawhead

  5. These are really great looking! Very cool stuff.
    I know this isn’t really what the post is about, but I’m most curious about what you used to do the “did you mean…” thing under the search box. I’m developing an app that is pretty similar to this in interface, and have been trying to figure something like that out for a while.

    Brian Lange

    August 17, 2011 at 9:50 am

  6. Wonderful. I am particularly impressed that you did not allocate people to parks or cemeteries. Were there other layers, like industrial zones, that you did not allow population to be allocated to? The City of Chicago released building footprints, it would be amazing to a map with that incorporates that data, particularly on the south and west sides with so many vacant lots.

    Forest Gregg

    August 20, 2011 at 3:30 pm

    • Forest, as you say, there are a number of different geographies we could use to punch out areas where people don’t live. We tested out a number of ideas, including using industrial corridors and other, less granular, shapes. However, we ultimately settled on only punching out blocks which the census designated as being empty. There are a number of reasons for doing this, including:

      * We are being consistent with our data sources.
      * Several of the city’s shapefiles, including the industrial corridors are old enough to not reflect recent changes in zoning and would thus exclude areas that are genuinely populated.
      * On occasion there is a real, sizable population living in an urban park or other area and we don’t want to completely obscure this.
      * It can’t be perfect, even if we punch parks, plazas, waterways, and industrial areas we still put people on the highway. When using tracts and some point you have to decide its “good enough”.
      * We didn’t have all these data sources for the 7-county area, so the resulting map would be of uneven accuracy.

      Regarding housing footprints, its a fascinating dataset, but very weak in the area of metadata, so I’m not sure exactly what we could do with it. I agree its a fun one to theorize about, but thus far we haven’t identified an iron-clad use-case for it.

      Thanks for commenting!

      Christopher Groskopf

      August 21, 2011 at 12:40 pm

      • Well, you got a great result just using that census data. I think it makes a big improvement over the New York Times census explorer, which seems to have not gone down to the block level.

        Combining different map layers in order to validly decompose areal units is indeed a tricky problem. There are techniques that guide the distribution of points within an areal zone but ensure that the total number of points remains the same. Such techniques are known in the geostastical literature as pycnophylactic, or volume-preserving, dasymetric methods

        But, ultimately, as you suggest, increasing accuracy is not free, and you have to make a decision about what is good enough. Looks like you guys made a great one here!

        Forest Gregg

        August 21, 2011 at 3:40 pm

  7. […] with reporters to hold the powerful accountable for their actions (and build insane mapping tools and show your […]

  8. […] her i Norge?) ting (som geografisk fordeling av barn under 5 år kodet etter etnisitet) se denne for mer info om hvordan dette kan […]

  9. […] Matt Wynn: Seems like there’s always something new coming along and sweeping me off my feet. The Guardian’s Twitter visualizations provide analysis you’d be hard-pressed to create in story. It’s explanatory, revealing and just plain cool. There has been a boatload of stuff since Google began charging for maps. That change in policy led news devs to get off their duffs and start dabbling in beautiful, custom cartography. The Chicago Tribune has been pushing the envelope there. […]

  10. […] Chicago Tribune blog: Better web cartography with dot density maps and new tools… Chicago Tribune blog: Making maps (five part […]

  11. […] Chicago Tribune blog: Better web cartography with dot density maps and new tools… […]

    Links | RDataVox

    July 6, 2012 at 12:45 pm

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: