You work you’re way through the data, come up with an idea for the most interesting angle of analysis, chart your data, and delight yourself with the compelling emerging display: look how clearly it shows the difference between men vs. women! Time to incorporate some further design choices to prepare it for publishing for others…>
And then you come to a grinding halt: What colours should I use for the genders? Blue for boys, pink for girls. Obvious, right?
Well yes, it is so established and immediately recognisable but maybe it is, at best, overly clichéd and, at worst, patronising and offensive in its implication?
What to do!? There is plenty of discussion and increasing awareness around the significance of this issue but no real firm ‘always…’ or ‘never…’ guidance. Therein lies the nature of data visualisation, I guess.
Personally speaking I don’t always quite know how to judge the best colours to use for gender association. I generally tend to try avoid using the blue-pink but maybe in doing so I’m undermining the efficiency and readability of the things I present?
I therefore wanted to try to get closer to some sense clarity about the do’s and the dont’s, and whether or not such clarity can even exist.
As you might predict the ‘lazy’ use of pink has drawn much of the stinging attention (see John Oliver’s take down in particular). Some of the defence offered by the party was along the lines that this colour is simply hard-wired in it’s association with the female gender.
I’ve not done tons of reading but some quick searching it seems that most sources exploring the how and why this association became established end up in a cycle of inconclusive nurture vs. nature counter arguments. I don’t think we can lean on any clarity from the past or from science either.
There are some sources that suggest that around the time of the First World War the association was actually the other way round. Boys were paired with pink and girls with blue. However, this piece seems to dispel this as most likely a myth.
Going back to the pink bus, the underlying aim of the campaign to “talk to female voters ‘around the kitchen table'” strikes me as arguably the most patronising aspect of this. As my wife astutely put it, “colour doesn’t cause the problem, people’s attitudes do”.
Let’s look at a couple of prominent examples of the pink-blue combination. The classic example would be Martin Wattenberg’s ‘Baby Name Wizard‘ that shows the popularity of boys and girls names over time. It sticks with the classic association of blue for boys, pink for girls.
Last week, the New York Times published ‘The Changing Nature of Middle-Class Jobs‘ showing the gender breakdown of different occupations. The print version used a cyan-magenta colour, the web version used a redder version for the female category. When discussing the colour choices with Gregor Aisch, one of the authors of this work, he explained to me a critical contextual factor that influenced the final choice of colour usage:
The final argument for blue/pink is even simpler. If you want to print thin colored lines in a newspaper you have to stick to a single color tone, as otherwise the lines get blurry when printed as a mix of multiple colors (as the color plates are never 100% aligned). And in CMYK, your only real choices are cyan and magenta. In the web version I changed the magenta to something more red-ish to make the chart more readable for the color-blind.
So what are the alternative colour choices that people might consider using?
Change pink (part 1). The blue is still identifiable for boys leaving just one new colour for the reader to learn (but essentially they don’t have to learn as it is the only other colour).
Change pink (part 2). A darker blue that maybe shifts away from the default ‘baby blue’. Beware the association of ‘men dominating above women’ with this particular choice of layout, as well as ‘values that go under the x-axis are negative’.
Change both colours completely. This avoids the possible negative connotation with the blue-pink meaning but means the reader has to learn new colour associations.
Switch blue and pink! Maybe this goes too far, whereby the reader may not be alert to read the colour key and misread the graphic entirely.
Where do you stand on the blue-pink issue? If you were a designer about to assign colours to the gender values in your chart, what choice would you make? Here is a simple 1-question poll that will hopefully allow us to discover a little more about the attitudes towards the use of colour with gender. (nb. This post is concerned with gender as a binary option. I know that is not reflective of society in 2015 but for the purposes of this post we are just focusing on male vs. female).
RESULTS SO FAR (this is the automatic Google Form analysis, not mine by the way!)
A thought for the day. Stick with me on this, I’m still deciding if there’s something in it: I will only know after I’ve published it though so here goes.
I’m musing about this visualisation work that has received a lot of love and attention on blogs and social media over the past few days. Created by the excellent people in the Wall Street Journal graphics team, it portrays data about the impact of vaccines in battling infectious diseases in the 20th Century.
The chart that has had the most impact, due to the highly topical nature of its subject, is the measles chart. This was certainly the image that was drew me in and was used to accompany the many positive tweets it received.
My question is this: do we like the visualisation or do we like the data?
I have a few questions about the colour scheme – it is not explained what the implied threshold of the blue >> green colour means – but it is an attractive looking chart and, in using a heatmap, a suitable choice to display this data. And what a story we see! The shape of the data after the introduction of the vaccine wonderfully demonstrates evidence of its success.
Unquestionably, the visualisation reveals the findings from the data very clearly. However – and this is not a criticism whatsoever – other visualisation approaches would also have revealed this pattern (albeit most would only show an overall pattern than perhaps a state resolution). I would even maybe dare to suggest that the numbers are so compelling that you would ‘see’ the bigger picture of reduction even in a table of raw data.
So, that question again, but framed more broadly. When we like a visualisation like this is it because it is the best way, maybe even the only way, to show a certain pattern in data OR are we actually so engaged because the data that sits beneath the visual has a clear causal relationship between an action and its effect over time – the holy grail of analysis!
What if there was no sudden drop, as shown in the crudely photoshopped version below? Is the visualisation – the design layer – still of merit? Of course you probably wouldn’t produce the visualisation in the first place but, having lost that wonderfully clear reduction in the measle cases in the data layer, does it change our view of the design layer?
I guess it depends on your perspective and what you’re seeking to understand. Yes, I know that’s criminally boring but it is true. If you are pro or anti-vax will certainly influence your stand point on the findings but I’m kind of interested in going beyond those subject matter biases to think about how we evaluate a visualisation’s merits.
Sometimes in workshops, in one of my early class exercises, I might find myself lavishing praise on a visualisation work based on certain design features. Yet the delegates in the room, who haven’t at that stage necessarily developed such a forensic lens, are more underwhelmed by what they are seeing. They are looking through and beyond that visual surface at the patterns of data beneath and are maybe not getting much from it. It is almost like I’m so concerned with assessing the choice and quality of glass in a window I ignore the view beyond.
As a visualisation practitioner, I am often so more concerned with design choices that I rarely find myself looking beneath the design surface. For example, the discovery I treasure most from this project is the super hover action when your cursor sits above the heat map cells and a little marker appears on the colour scale to assist in reading the value. I’ve not seen (or noticed) that device being used before and think it is a brilliant little idea. Measles is going down? Yeah, great, but have you seen the hover marker!
I’ve run out of muse-momentum now so I’ll put down my pipe, and do something else other than contemplating life whilst staring out of the window. Or maybe at it.
One of the increasingly frequent questions I get asked, particularly by people from a scientific or financial domain, is how to effectively visualise uncertainty of data and of statistics. My response is usually to make suggestions around annotated markings and/or colour gradients to indicate increasing or declining certainties.
I’ve been gathering bits evidence for these suggestions and any other sample solutions that might work in different contexts. There aren’t that many but I have compiled some references, papers and examples for anyone interested. If any others emerge I will add them to this list, so if you have any suggestions, please let me know:
1. ‘Information Graphics: A Comprehensive Illustrated Reference‘, by Robert L. Harris. (I love the cigarette and pencil shaped plots here. Harris also refers to ‘Fuzzygrams’ but I’ve yet to get a full copy of his book to reach that chapter).
2. The Bank of England’s GDP fan chart profiled in a paper titled ‘Visualising Uncertainty About the Future‘, by David Spiegelhalter, Mike Pearson, Ian Short.
3. ‘Error Bars Considered Harmful: Exploring Alternate Encodings for Mean and Error‘, by Michael Correll and Michael Gleicher.
4. ‘Visualizing Data Uncertainty: An Experiment with D3.js‘, by Alex Krusz.
5. ‘A Review of Uncertainty in Data Visualization‘, by Ken Brodlie, Rodolfo Allendes Osorio and Adriano Lopes.
7. Case study: Visualising uncertainty on improving-visualisation.org.
8. Some useful comments (one very detailed in particular) on Nathan’s post on FlowingData from 2013 ‘Visualizing uncertainty still unsolved problem‘
9. ‘From here to uncertainty‘, the UK Government Statistical Service’s guide to handling and communicating uncertainty
10. ‘Maternal mortality‘: A Hans Rosling Gapminder video concerning ‘How many women die every year during pregnancy and childbirth? Do we even know?’
11. ‘Communicating Uncertainty in Official Economic Statistics‘, by Charles F. Manski
Anybody facing up to colour choices in any creative activity knows how critical such decisions can be, making a big difference between the success and failure of a design. This is amplified in data visualisation. In contrast to graphic design work, for example, there is unquestionably a greater need to eliminate or at least reduce arbitrary choices. The use of colour to decorate work or to sprinkle colours that we simply like have to be the final considerations – never the first – once all the other functional applications of colour have been implemented.
When considering these more functional applications, one of the most reliable and versatile colour options is grey (regardless of semantics about whether it is a colour or a ‘colour without colour’, see more). The advice I often give out to folks in my training workshops is to make grey your best friend when colouring your visualisation work.
Let me illustrate why with a few simple examples:
In the ‘Fertility and life expectancy’ graphic (the accidental Reindeer chart?) below, created by Moritz Stefaner in his ‘Remixing Rosling‘ project, colours are used to accentuate the US and Vietnam patterns. This highlighting effect is made possible through the relationship between the two colours and ‘no-colour’, the use of grey for all the other country plots.
In this next display – the bullet graph conceived by Stephen Few – we see the value of grey scale in creating a reference for interpretation. Any given black bar represents a quantitative value that can be read alongside the various grey shaded background thresholds. This provides the context of meaning – is the value good, bad or average, for example.
In a similar way, this example of Bryan Christie’s exceptional scientific illustration work uses a contrasting palette to highlight the focus on the heart. Through the use of greyscale (almost an x-ray style look) to illustrate the overall anatomy, this allows us to understand the positional context of this organ without subtracting from the focus.
In this neat multi-line chart below, by Maarten Lambrechts, we see historical readings of temperature in Belgium, one line for each year, across the 12 months of the year. There is a triple benefit in using grey in this work. Firstly, we can draw focus on a given year by hovering over a line and having colour help bring it to the forefront. Secondly, the inclusion of all data in one display for context allows to judge whether the selected series is higher, lower or typical of the rest of the dataset. Thirdly, by using a neutral colour like grey we can see the big picture – the overall shape and pattern of the data – surfacing the seasonality and spread of values, without colour getting in the way.
This sample dashboard by Welovroi demonstrates how grey can be used as an elegant layout/organising device, to subtly separate the various panels of the display without the need for shoutier backgrounds or intrusive borders.
This sample interactive work by Raureif and Christian Behrens, showing energy flows, is just one of endless examples whereby grey is used to display features or values that have been momentarily unselected or excluded whilst leaving them visible for reference.
Finally, an example of using grey almost as a placeholder colour for zero or null values. In this ongoing Bloomberg Billionaires project those people for whom we have no recent photos or visual depicting how they look are presented as a grey, blank face placeholder in contrast to all the other illustrations. This in turn creates some intrigue as to why we don’t know what they look like (reclusive billionaires have a certain extra mystique…).
I was honoured to be invited to speak at the Alliance for Continuing Education in the Health Professions Annual Conference for 2015 event in Grapevine, Texas. The title of my talk was commissioned by the organisers ‘Communicating Through Data: Visualising Your Story’ and below are the slides I presented. If you have read my previous talk ‘Let’s do some thinking about visualisation thinking‘ you’ll see many similarities. Also, if you’re attending my upcoming talks at Astra Zeneca or PyData (see, speaking events) maybe sit this one out so as not to ruin the upcoming magic
In an attempt to increase my blog post frequency, this year I’m going to publish more smaller posts that try to impart morsels of advice or thought-provocations about certain visualisation design matters. No deep dive theoretical exploration, just some practical tidbits most probably relating to quite narrow design considerations.
I’m going to start off 2015 focusing on a long-held gripe I have with map-based visualisations that colour the sea.
A tweet popped up on my timeline earlier from Max Gadney @aftertheflood commenting on a pair of maps comparing immigration levels with high numbers of UKIP voters in the UK.
You’ll see the map on the left has a large amount of clutter created by the unnecessarily coloured sea. As Max correctly recommends, we don’t need to draw such prominent salience to the sea, it has nothing to do with this data display. It also makes the judgment of colour scales within the left map harder (and obstructs the comparison with the right hand display, which has no sea colouration). A colour judgment made against a backdrop of a saturated colour can lead to very different perceptions compared to a backdrop of white.
We almost never have a need to care about the world’s seas and oceans in a display about geospatial data. In the majority of occasions the data relates to things that have a ‘land’ relationship. Yet, so often we have these big saturated areas dominating our view with nothing more than decoration.
It is especially problematic if blue is used for the sea but blue is also being used for a quantitative or categorical colour scale within the map. We have mentally committed to reading blue as meaning the sea but we now have to contend with an additional association.
The only times we usefully need to colour the sea are when there is visual relevance in the relationship between land and sea (usually distinguishing with an emphasis on what is land) OR the focus of your data portrayal concerns what is happening on or over the sea, like the patterns of wave heights or shipping routes seen in the below.
If you’re not going to use the sea for these reasons then turn it off or at least turn it down.
Last week there was an article on Wired profiling an upcoming tool from Tableau called Elastic, which drew my ire. The tool looks fine, haven’t seen a great deal about it but I’m sure it will find a user base.
What initially caused my Roger Moore eyebrow to spring into action was the way the article framed the tool. Check out this tweet from Wired.
“Spreadsheets are awful”. Just plain ignorance.
Spreadsheets are incredibly valuable tools for handling data, undertaking calculations and analysing it. They are not the most powerful of statistical analysis tools but they often provide enough. They are not the most potent charting packages, but they often provide enough. They are a fundamentally useful ally.
When someone emails a spreadsheet to your iPad, the app will open it up—but not as a series of rows and columns… The hope is that this will make is easier for anyone to read a digital spreadsheet—an age-old computer creation that’s still looks like Greek to so many people.
Sure, people do produce and share some really impenetrable workbooks. They dress up tables of data with the most horrendous shading and bordered decorations. However, as with bad PowerPoint slide decks, it is so lazy and easy to blame the tool and not the creator. What if I wanted the table of data un-visualised? Maybe I want to use the raw data as it comes, maybe I want to perform a lookup-and-reference type of interpretation? We don’t have to nor do we want to visualise everything. Let’s be more discerning than that.
That was the first thing. The second thing that caused my beef was the angle used to substantiate this type of tool as a kind of panacea that will automate and (that most dreadful of words) democratise the role of visual analysis.
So many companies aim to democratize access to online data, but for all the different data analysis tool out on the market, this is still the domain of experts — people schooled in the art of data analysis. These projects aim to put the democracy in democratize.
I don’t even know what that last bit means. Surely that would lead to democracytize?
These kind of confused articles bluntly reduce the craft of data visualisation, data science and data journalism into the most simplified of disciplines, something that an automaton should operate. They smooth over the complexities of working with data in a way that only existed in the idealised scenario offered by Microsoft’s practice ‘Northwind’ database.
The hope is anyone can become a kind of data scientist — using data in ways that echo so many journalists these days, from Nate Silver on down.
A “kind of data scientist”. “Nate Silver on down”. Wonderful, sign me up, I’m sold.
The Seattle-based company has been massively successful selling software that helps big businesses “visualize” the massive amount of online data they generate
I was immensely grateful to be invited to speak at yesterday’s excellent Visualized.io conference event in London. As I previewed last week, the title of my talk was ‘The Design of Time’. With only a 15-20 minute slot I couldn’t possibly fit in everything that I wanted to profile (indeed I probably shouldn’t have attempted everything that I DID profile) and so here is a director’s cut version of yesterday’s talk.
I’ve had this issue on my mind for a while now but haven’t really found a way of expressing a cohesive post about it. I still haven’t, as you’ll find by the time you reach the bottom. Let me state from the outset: today, I am the problem guy, not the solution guy. However, I felt I’d pondered for long enough and so decided put this out there to trigger some further thought and discussion.
As we will all know, the work emerging from the contemporary data visualisation field is dominated by digital output. Of course, there is still a significant amount produced for print consumption but, ever-increasingly, data visualisation is a digital – made-by and made-for – pursuit.
The history of the field preceding this recent era had a legacy of work that was easily archived and replicated for viewed in books or libraries. But how do we preserve the incredible array of digital data visualisation work being produced by this and future generations? It is an issue that goes beyond just safeguarding URLs and certainly goes beyond just the field of data visualisation.
Last evening, there was a terrifically astute stream of tweets from The Upshot’s Derek Willis, discussing web/data journalism, that articulated the concerns perfectly
As I perused some of the many tremendous web-visualisations tracking the recent US mid-term elections I was struck by the fleeting status of a graphic being fed by live data updates as they occur. As the story of an election night unfolds there will be all sorts of interesting ebb and flows, different points where the story arc seems to be heading in different directions (maybe not in this particular election but you get the point).
As soon as the new data comes in, the composition and content of that live graphic has changed.
This is not unique to elections of course, any real-time or frequently updated visualisation.
Over on Bloomberg they have the excellent Billionaires project, with a daily update on the fortunes (absolute and changing) of the world’s rich.
What is interesting about this project, as Lisa Strausfield discussed in Data Stories episode #41, is that Bloomberg has journalist resources assigned to stories around billionaires. It’s a matter of common interest and intrigue so why not. Perhaps because of this dedicated resource there is a daily archive of the status of the billionaire’s rankings project for any given date (eg. 11th April). So it is very easy to revisit a point in time and see how person x did on that day
Not every real time project will have that resource, nor will it have a subject matter that has levels of potential interest that endure on an ongoing basis. So what can be done for those projects?
Another example of the preservation challenges. It sounds like soon (or even already) parts of the US will be getting very cold. I saw this tweet with a still snapshot of the live ‘Earth‘ weather map visualisation by Cameron Beccario.
Seduced by these patterns I also took a look at the display on the hint.fm ‘Wind Map‘ and took my own screenshot to preserve that data ‘moment’.
I’d forgotten that the Wind Map project does have an archive gallery of previously interesting or noteworthy weather events.
However, that gallery has not been updated since Hurricane Sandy in October 2012.
As soon as this latest weather system passes, those interesting patterns are gone forever. Unless someone archives them.
As the final members of the graphics teams (1, 2, 3, 4 etc.) across the news media finally shut down their machines after a long night of mid-term election coverage, I am reminded of a great article by Matt Ericson from 2010 titled ‘When maps shouldn’t be maps’. (Addition: A very helpful ‘Map or Don’t Map‘ flowchart from John Nelson at IDV)
In this article Matt describes the need to be more challenging in our natural assumption that simply by having spatial data we should map that data: “the impulse is since the data CAN be mapped, the best way to present the data MUST be a map”. If the interesting patterns are not spatial then a mapped display is fairly redundant. We may learn more from a location-categorical display comparing quantities or how values for those locations have changes over time, ranked by the largest to smallest changes, for example.
However, on the flip side, when the interesting patterns ARE spatial, then of course, the layering of a data display on to the apparatus of a ‘map’ makes complete sense. Over the past week I have come across two different but very effective examples that demonstrate this.
Firstly, a very revealing visualisation (Alberto’s viz of the week, no less) about ‘Obama’s Health Law‘ by the New York Times. The map displays the percentage point increases, county by county, of Americans with health insurance under the Affordable Care Act.
I don’t know a great deal about the Affordable Care Act, particularly the political mechanisms that make it available or otherwise, but from looking at the display you can immediately see regional discrepancies that MUST be reflective of state level policies. Reading the accompanying article explains this observation in further detail:
That state boundaries are so prominent in the map attests to the power of state policy in shaping health insurance conditions. The most important factor in predicting whether an American who had no insurance in 2013 signed up this year was whether the state that person lives in expanded its Medicaid program in 2014.
By way of illustration, the piece draws contrast between Kentucky, which expanded Medicaid, and Tennessee, which didn’t. This was something highlighted by Lena Groeger on Twitter
There are many other spatially significant differences that support the benefit of displaying this data (albeit, just one view or one slice of analysis about that data) via a map: It reveals interesting patterns that would not have been as effectively or efficiently portrayed using other approaches.
The second example I came across concerns a different idea of mapping, this time the mapping of the geography of the human body. The graphic ‘Bumps, Bruises and Breaks’ by the Wall Street Journal – originally found on Junk Charts – shows how NFL players have sustained over 1300 injuries this season and where these injuries occurred on the body.
Plotting the quantitative displays of injury totals across the different parts of the body makes complete sense. It is more concrete, you can see the distribution more instantly. By having the illustrated player in the background you can also draw conclusions about the sufficiency (or otherwise) of the protection they get from their kit. Incidentally, Kaiser does a great job of offering up some further enhancement ideas for the graphic.
So in conclusion, just because you can map your data, doesn’t mean to say you should. Have the discipline and sense to challenge your natural impulses but, when it does make sense to do so, plotting spatial data on a map can really illuminate the inherent patterns.