Tribune DataViz

Matters of interest, from the data reporters and developers across Tribune Publishing

Style and Substance: Analyzing a Beach Ball Chart

with 5 comments

This morning our friend Scott Klein tweeted about a chart published in the Susan G. Komen Foundation 2009-10 annual report:

The original "beach ball" chart.

For a variety of reasons, pie charts can be a misleading illustration of numbers, and this one did seem like it might suffer from similar problems. After a light prod from my PANDA colleague Chris Groskopf, I set out to see how well the visual representation of the numbers matched the reality. I wanted to determine if each segment of the chart actually represented the proportion of the whole that was promised by the labels.

There are probably many more formally correct ways to do this, but Chris suggested using the Python Imaging Library (PIL) to count the colors and determine their proportions. I hadn’t previously explored PIL, but looking at the documentation, I found the Image.getcolors method which seemed promising. It returns a list of pairs, where the first value in each pair is the number of pixels of a given color and the second value is the RGBa specification for that color. (As an aside, simply calling getcolors() returns the value None. I’m not sure why, but the docs mentioned passing the area of the image as an argument, and using im.getcolors(im.size[0]*im.size[1] returned data.)

A simplified version of the beach ball.

Not immediately usable data, however. Using that method on the original image returns a list of over 3600 colors, while when I look at the chart, I see only six. The problem is that digital images use a technique called “anti-aliasing” to make images appear smoother. Shadows and edges of text and shapes use many subtle variations on the most significant colors.

To simplify the image, I used Acorn, a Mac image editing tool. I cut off the black text labels. I used a magic wand selection tool to select the remaining “white” background (including the drop shadow) and cut that out, leaving a transparent background. I then used the magic wand to select each segment and filled it with the most representative color in the segment. This produced an image with 407 colors (shown right), and 400 of those colors appear in less than ten pixels each.

Using the simplified image, I wrote some python code using the interactive interpreter. The following code roughly reconstructs what I did:

from PIL import Image
  (236, 133, 191, 255): "Education",
  (209, 207, 212, 255): "Research",
  (204, 34, 132, 255): "Screening",
  (226, 156, 188, 255): "Administration",
  (241, 177, 211, 255): "Treatment",
  (162, 163, 167, 255): "Fundraising",
im ="/tmp/simplified.png")
colors = im.getcolors(im.size[0]*im.size[1])
colors = colors[1:7] # I know the first is transparent and I only care about the six most common after that
total = sum(count for count,color in colors)
for count,color in colors:
  print "%s %.1f%%" % (COLOR_LOOKUP[color],float(count)/total * 100)

which produced the following:

Education 36.0%
Research 24.5%
Screening 16.0%
Administration 8.1%
Treatment 8.0%
Fundraising 7.4%

I created the COLOR_LOOKUP dict after the fact, comparing the colors which PIL found to the colors in Acorn. They didn’t match exactly, which is strange, but they’re close enough that it’s pretty clear how to match up the PIL colors to the labels.

Here are my results laid alongside the labeled values:

Category Computed Labeled
Education 36.0% 34%
Research 24.5% 24%
Screening 16.0% 15%
Administration 8.1% 12%
Treatment 8.0% 7%
Fundraising 7.4% 8%

It is likely that some of the mismatch can be explained by the white borders between segments of the chart. I wonder how one might mathematically compute the data loss those create, and how one might prove (in the geometric sense) what shapes are least subject to that sort of problem. I suspect there are also natural perceptual tricks to how we see circles, and how we interpret those kind of curved lines. They suggest a three-dimensional object which means we may subconsciously be adjusting our understanding of the parts of the “ball” which we interpret as “farther away.”

While I’m not suggesting that anyone at Komen willfully misrepresented these numbers, it is interesting to see that the percentages of expense for administration and fundraising (a.k.a. “overhead”) come out rather lower when the actual area of colored pixels is assessed. On the other hand, administration is “farther away” in the visual field, so we may perceive it as larger than the actual number of pixels.

As data visualization becomes more common, it is important for readers to learn to critically interpret charts and infographics. This seems to be a case where the choice of a visually novel graph is probably interfering with the clear and accurate communication of the numbers the graph is meant to understand.


Written by Joe Germuska

February 7, 2012 at 2:58 pm

Posted in Data Visualization

5 Responses

Subscribe to comments with RSS.

  1. I don’t think the difference is because of the lines between the segments, It looks like the creator just (poorly) manipulated a stacked bar chart to look like a beach ball. See

    David Yanofsky (@YAN0)

    February 8, 2012 at 9:21 am

    • David:
      Interesting! Thanks for the illustration. No one lets designers set body copy in irregular shapes to satisfy aesthetics. Hopefully it will become taboo to manipulate data visualizations in ways that distort the information.

      Joe Germuska

      February 8, 2012 at 10:38 am

  2. Nice code. I think though that the main psychological distortion of this ball representation is foreshortening, not the perspective scaling you suggest.

    David Read

    February 17, 2012 at 4:24 am

  3. The code is great!
    Thank you for posting it here, it turned very helpful for me actually.


    May 11, 2012 at 8:16 am

  4. […] Sadly though, just when you think that the war is won and data is safe, someone invents the beach ball chart. […]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: