Go Deep: Using DataThief to Rebuild Misleading Figures

Do misleading figures drive you crazy? Jon Fisher shows you how DataThief can help. Photo: Flickr users Phil and Pam under a Creative Commons license.

Do misleading figures drive you crazy? Jon Fisher shows you how DataThief can help. Photo: Flickr users Phil and Pam under a Creative Commons license.

By Jon Fisher, spatial scientist

Have you ever looked at a difficult-to-read graph and wished there was a way to figure out what the precise values of the data were?

Or maybe you wanted to extract the data so that you could do your own analysis (or at least produce a clearer graph)? You’re in luck!

DataThief is a program that lets you take an image of a graph or chart and extract the underlying values.

To show how useful this can be, I’m starting with a misleading graph I recently found and recreating it to be more informative and honest.  (I’m using a figure unrelated to conservation for demonstration purposes).

Here’s the misleading figure:

data Thief before

This graph — by having an absolute value on one y-axis, and a percentage on the other y-axis — creates the false impression that unemployment and lack of insurance are both sharply increasing (and that the rate of unemployment has surpassed the rate of lacking insurance). I used DataThief to extract the underlying data (see my blog Science Jon for detailed instructions on how I did this), which looks something like this:

datathief how to

In Excel I multiplied the values for “Uninsured Americans” by a million to get the true number. I then got some estimates of US population for January 2008 and 2009, and used those to calculate an average growth per month, the baseline population in March 2007, and the population for each of our data points.

This allowed me to calculate the percentage of Americans who are uninsured, to allow us to compare that to the percentage of Americans who are unemployed.

A graph of the resulting data reveals a different pattern than what we saw before: lack of insurance is increasing very slightly (from ~15.2 percent  to ~16.1 percent) as unemployment increases more rapidly (from ~4.4 percent to 7.6 percent). Note that the  unemployment percentage never surpasses the percentage of people who are uninsured (contrary to how this appeared in the original graph):

Data Thief after

Comparing side-by-side:

BeforeAfterDataThief (2)

There are two important considerations before using this software. First, these values will only be approximate, so if possible it’s always better to get the underlying data from the person who created the first figure. Second, it is possible that the data you are extracting is copyrighted, and that your reuse of their data may violate the data license. Use at your own risk!

Note that despite the name, DataThief is shareware; if you find it useful, please put your thieving on hold long enough to buy a $25 license.

Opinions expressed on Cool Green Science and in any corresponding comments are the personal opinions of the original authors and do not necessarily reflect the views of The Nature Conservancy.

Photo Credit: Flickr users Phil and Pam under a Creative Commons license.


Jon Fisher is a conservation scientist for The Nature Conservancy. He has studied forestry, environmental biology, stream ecology, environmental engineering and how technology and spatial analysis can improve wildlife management at airports. His current work mostly revolves around sustainable agriculture and spatial analysis. He also loves vegan cooking, biking, and finding ways to inject science into everyday life.

Comments: Go Deep: Using DataThief to Rebuild Misleading Figures

  •  Comment from Chris

    This is great Jon! If more people looked at data and information like this with as critical an eye as you I think we’d all be better off.

  •  Comment from Jon Fisher

    For those who are interested in using DataThief, I have an expanded version of this post with more detailed instructions at http://bit.ly/11Zs9oH

  •  Comment from Shai Vaingast

    You can also use im2graph (http://www.im2graph.co.il). It’s available on Linux and Windows and is free. Simple but powerful.
    – Shai

 Make a comment


Enjoy Osprey Cam Live!

The Ospreys Are Back!
Live views, 24/7, of an Alabama osprey nest. Record your observations and ask our ecologist about what you’re seeing.

What is Cool Green Science?

noun 1. Blog where Nature Conservancy scientists, science writers and external experts discuss and debate how conservation can meet the challenges of a 9 billion + planet.

2. Blog with astonishing photos, videos and dispatches of Nature Conservancy science in the field.

3. Home of Weird Nature, The Cooler, Quick Study, Traveling Naturalist and other amazing features.

Cool Green Science is edited by Matt Miller, the Conservancy's deputy director for science communications, and managed by Lisa Feldkamp, an American Council of Learned Societies fellow with the TNC science communications team. Email us your feedback.

Innovative Science

Call for Inclusive Conservation
Join Heather Tallis in a call to increase the diversity of voices and values in the conservation debate.

Appalachian Energy Development
Where will energy development hit hardest? And where can conservationists make a difference?

Not a sci-fi movie. A true story of nanotechnology & clean water.

Bird is the Word

Latest Tweets from @nature_brains