Discussing ‘Are the richest American’s also the best educated?’

On Twitter over the weekend, a number of visualisation grandees (Mortiz Stefaner, Andrew Vande Moere, Robert Kosara, Enrico Bertini and Noah Iliinsky) have been discussing and debating the rights and wrongs of a particularly unusual chloropleth map published in Good magazine.

Zoomable view here

Typically the infographics presented in Good magazine are inglorious pieces of graphic design being passed off as informative visualisation yet demonstrating very few of the principles that guide this subject area.

On this occasion, however, they have published a design (in collaboration with Gregory Hubacek) which demonstrates an innovative approach to representing three variables of data overlayed onto a geographical landscape. Whether this is the most effective method we’ll reserve judgment for now.

The data in question relates to the US Census American Community Survey and presents data for all US counties for high school graduates (%), college graduates (%) and median household income (£).

To present this data the designer has assigned a colour scheme to each variable (magenta for high school, yellow for college graduate and cyan for income) and then encoded the values for these variables on separate maps to show variation in the saturation of each colour.

To create the final design, he has then overlayed all three colour schemes onto a single map to represent the combined levels of high school graduates, college graduates and median income via a single colour which is a product of the original three. Imagine mixing different levels of blue, red and yellow paint on a palette. The legend below describes this in more graphic detail:

Initially, the result of this is a fairly unintuitive and difficult-to-read graphic. Each county’s colour needs translating backwards using the guide on the left to understand how it should be interpreted. It is, however, an unquestionably interesting approach to tackling the challenge of presenting three variables of data on a map. Furthermore, the difficulty in reading the colours does not imply that the design approach deceives the viewer. That is certainly not a criticism you could level at it.

The comments that have emerged about the design have raised concerns about the easy of perception of the colours and the extent of ‘learning’ required before being efficiently readable and have considered the idea of reducing the variables to from three to two by creating a combined ‘education’ variable from a merger of high-school and college graduates.

Further interesting narrative on this piece is available via Fastcodesign which talks about a method of interpretation where you try to consider the colour element missing more than the colour elements present to arrive some conclusion. The problem with this approach is that it only really works when you have particularly vivid and obvious colour combinations (such as orange meaning there is little blue, purple meaning there is little yellow etc.).

I really enjoy coming across new methods of visual display, especially when it is done in a considered manner like this rather than one purely designed to satisfy aesthetic appetites. The idea of encoding three sets of data using a RGB-mix is very novel. Unfortunately, I think the result is just too difficult to make sense of. Whilst our visual perception is excellent at detecting changes in a single colour, we simply aren’t built to easily detect this across three colour changes.

For what its worth, I think the suggestion to reduce the variables from three to two is an excellent suggestion, bring a better balance to the dataset and lowering the complexity factor. I then believe that taking the display away from a geographical platform and towards a scatter plot would be useful, perhaps colour coding specific region of the US to facilitate geographical conclusions. That display would present an effective visualisation response to the question/hypothesis being posed “are the richest Americans also the best educated?’.

15 Comments

Jon PeltierJanuary 17th, 2011 at 2:20 pm

“Initially, the result of this is a fairly unintuitive and difficult-to-read graphic.”

Not just initially. The concept may have been interesting to consider, combining three monotonic variables in one view. However, I don’t know how many people can quantitatively deconvolute RGB (or CMYK) values in their heads. It doesn’t work for me, despite several sessions trying to interpret it. Novel approach not implemented coherently (perhaps not implementable).

The map also fails to show correlation. I guess perfect correlation would be a gradient from a light color (white, or a light pastel to indicate non-zero intercept) to a dark color, and any differently colored regions would include residuals. But then, a scatter chart (or two) would show this more effectively.

David McCandlessJanuary 17th, 2011 at 3:36 pm

I took a similar approach a year or so ago with the UK map.

http://www.flickr.com/photos/25541021@N00/4038713677/

I’ve tried 3 colour maps like this privately but they rarely work. It’s really tough with a complex dataset.

Andy KirkJanuary 17th, 2011 at 4:11 pm

Thanks Jon/David for your comments.

Successfully encoding multiple values via a single visual property is always going to be tricky to pull off, especially when it is applied to hue, given its very low ranking in the hierarchy of visual methods as defined by Bertin, Cleveland and MacKinlay (http://www.joeparry.com/blog/uploaded_images/DesignGuidelines-707946.png).

One of the key criticisms of the effect of this design is that it has made a presumably straightforward looking dataset more complex to draw insight from.

Still think its worthy of praise for the experimental intent if not the ultimate effectiveness of its execution.

Thanks
Andy

Andy CotgreaveJanuary 17th, 2011 at 6:07 pm

I can’t add to Jon’s or David’s comments. I do wonder what the RIGHT way to visualise this data would be. If you want a map that shows differing degrees of 3 variables, is there *any* way to do it? I can’t think of any way this could be done on a map.

JaysonJanuary 17th, 2011 at 6:36 pm

What’s interestingly NOT discussed is whether or not a map is the best framework to use when discussing the correlation between education and income.

Reading more about the map it appears that it’s attempting to find the correlation between where someone went to school and where they’re working. Again, color aside, I see no corollary relationships here. I see independent data points being overlapped, without even showing if someone who went to school in county x now works in county x instead of county y.

The color system is a poor choice, albeit an interesting one. However, this map expresses the issue that occurs when graphic designers try tackling not just expressing statistical analysis, but doing that analysis themselves.

We’re talking about whether or not the color system works before we even ask if we’re answering the right question with the right data.

Andy KirkJanuary 17th, 2011 at 7:07 pm

Hi Jayson, thanks for your comment. Towards the end of my post I do propose that the visualisation would be more effective if presented as a dual axis scatter plot (having also reduced the variables to two) and this would better serve as a response to the question being posed.

You are right, however, to note that the majority of the focus is around the use of colour to depict the combined variable values. There is always an interest, in my mind, seeing somebody trying out new methods to present data in innovative ways. In this case the colour combo was the unique approach and so gets the focus but fundamentally it is flawed because, as you rightly point out, it is made necessary by the designer’s objective to present it on a map.

All the best
Andy

Robert KosaraJanuary 17th, 2011 at 8:03 pm

There’s nothing wrong with trying out new things, but that doesn’t mean than everything that’s new is good. Besides, encoding three data variables in three color channels is not exactly new, people have tried that for ages. It has never worked, though.

The problem is that our color perception doesn’t work that way. We don’t see “87% magenta, 47% yellow, 50% cyan”, we see green. And we can’t even tell a lot of hues apart, so the precision is very low. You can mix hue and brightness if you do it right (and you’re aware of the limitations), but you can’t simply mix three color components and think anybody will be able to figure it out. That just shows you have no clue how color perception works.

Also, as Jayson mentions above, the question is whether a map is really the best way to represent this data. The correlation has little to do with space, in fact I’d argue that the spatial component is standing in the way here of actually understanding the data (like the obvious correlation between highschool and college graduation rates).

Maps are a safe choice, because people think they understand them, and they can be made pretty by sprinkling them with color. But that doesn’t mean they’re the best way to represent any and all data, just because you can find some forced spatial component (however irrelevant).

Andy KirkJanuary 18th, 2011 at 9:15 am

Thanks for your comment Robert, further interesting thoughts. I’d not seen other examples in the past of encoding using RGB, you can see why people are instinctively drawn towards that as a solution but as you say our colour perception capabilities just match up to the demands this creates.

Enrico BertiniJanuary 18th, 2011 at 10:11 pm

I have nothing special to add on the judgment of the technique itself, I agree with the others. No way, I tried to spend 20 minutes to see if I could get something out of it. The answer is no.

However, you might be interested to know that Colin Ware actually tried to do so. He first explains in his classic InfoVis book how the theory of integral-separable dimensions predicts low performance (see picture http://twitpic.com/3qd516). The various color dimensions, even best ones like hue and brightness, are just perceived as one color (they are integral), there’s no easy way to decode them in our brain. Then, in page 142, he actually shows the same experiment but with scatter plots (see picture http://twitpic.com/3r9nbl). He says it could be somewhat useful as a way to cluster the data points but, again, there’s no way to decode them.

So, in summary: it’s not new, it’s not effective.

P.S. By the way … great post Andy … and thanks for calling us grandees, I love it :-)

Andy KirkJanuary 19th, 2011 at 9:13 am

Thanks for your feedback Enrico and also for the Colin Ware page images – is that from Visual Thinking for Design? Its a while since I read his books, reminded me that I need to revisit on a more frequent basis!
Best wishes
Andy

[...] etc … If your radar didn’t catch these signals take a look to the very-well-crafted Andy Kirk’s post which pretty much summarizes the whole [...]

[...] are the richest americans also the best educated? mostly, yes. [...]

[...] puzzled the data vis community. Moritz Stefaner notably asked others for advice and input, and Andy Kirk, then Enrico Bertini answered in some detail. I won't go into great detail on why I didn't like the [...]

Harold IanFebruary 4th, 2011 at 11:27 pm

I don’t agree that the richest are the most educated, it usually is but not always

QuoraApril 4th, 2011 at 1:43 pm

How does one overlay two intensity maps and still have a meaningful presentation?…

this depends on what you are trying to show. using color to combine the two markers is probably not going to work. (see http://www.visualisingdata.com/index.php/2011/01/discussing-are-the-richest-americans-also-the-best-educated/ for a discussion). if …