Citizen Science

Go Deep: Using DataThief to Rebuild Misleading Figures

May 8, 2013

Follow Jon
Do misleading figures drive you crazy? Jon Fisher shows you how DataThief can help. Photo: Flickr users Phil and Pam under a Creative Commons license.
Do misleading figures drive you crazy? Jon Fisher shows you how DataThief can help. Photo: Flickr users Phil and Pam under a Creative Commons license.

By Jon Fisher, spatial scientist

Have you ever looked at a difficult-to-read graph and wished there was a way to figure out what the precise values of the data were?

Or maybe you wanted to extract the data so that you could do your own analysis (or at least produce a clearer graph)? You’re in luck!

DataThief is a program that lets you take an image of a graph or chart and extract the underlying values.

To show how useful this can be, I’m starting with a misleading graph I recently found and recreating it to be more informative and honest.  (I’m using a figure unrelated to conservation for demonstration purposes).

Here’s the misleading figure:

data Thief before

This graph — by having an absolute value on one y-axis, and a percentage on the other y-axis — creates the false impression that unemployment and lack of insurance are both sharply increasing (and that the rate of unemployment has surpassed the rate of lacking insurance). I used DataThief to extract the underlying data (see my blog Science Jon for detailed instructions on how I did this), which looks something like this:

datathief how to

In Excel I multiplied the values for “Uninsured Americans” by a million to get the true number. I then got some estimates of US population for January 2008 and 2009, and used those to calculate an average growth per month, the baseline population in March 2007, and the population for each of our data points.

This allowed me to calculate the percentage of Americans who are uninsured, to allow us to compare that to the percentage of Americans who are unemployed.

A graph of the resulting data reveals a different pattern than what we saw before: lack of insurance is increasing very slightly (from ~15.2 percent  to ~16.1 percent) as unemployment increases more rapidly (from ~4.4 percent to 7.6 percent). Note that the  unemployment percentage never surpasses the percentage of people who are uninsured (contrary to how this appeared in the original graph):

Data Thief after

Comparing side-by-side:

BeforeAfterDataThief (2)

There are two important considerations before using this software. First, these values will only be approximate, so if possible it’s always better to get the underlying data from the person who created the first figure. Second, it is possible that the data you are extracting is copyrighted, and that your reuse of their data may violate the data license. Use at your own risk!

Note that despite the name, DataThief is shareware; if you find it useful, please put your thieving on hold long enough to buy a $25 license.

Opinions expressed on Cool Green Science and in any corresponding comments are the personal opinions of the original authors and do not necessarily reflect the views of The Nature Conservancy.

Photo Credit: Flickr users Phil and Pam under a Creative Commons license.

 

Jon Fisher

Jon joined The Nature Conservancy’s Measures department in 2005, and developed the organization’s first ever “Activity Measures” reports documenting the state of our assessments, planning efforts, and actions worldwide. He spent several years helping the Conservancy assemble and manage our global data, as well as helping to develop information systems that make it easier to access our data online. More from Jon

Follow Jon

Join the Discussion

3 comments

  1. This is great Jon! If more people looked at data and information like this with as critical an eye as you I think we’d all be better off.