As I am in the process of writing my book I find certain challenges in the use of language crop up time and time again. The main one I have difficulty with is maintaining consistency in how I term the person who reads, uses or consumes a visualisation or infographic work. It is particularly problematic when I am constructing a sentence that really needs a singular catch-all label and not a multi-comma-separated list attempting to cover all nuances. That makes it both clumsy to write and to read.
Yesterday, I asked my esteemed twitterati to suggest the language terms they use or feel most comfortable with in order to arrive at a consensus viewpoint or at least accept there are too many variations to be able to settle on one single term.
Rather than leave it buried on Twitter, I have storified a collection of the contributions people made (thank you to all again) so that others can join the debate.
However, in summary and unless I am presenting with an alternative compelling argument, my decision is to go with VIEWER. Regardless of the type and format of visualisation we are working with, we are always ‘viewing’ a visual portrayal of the subject’s data.
USER is an appropriately active term for describing those who engage with an interactive project – but even when we have the ability to interact we are not doing so constantly, we do stop to look.
READER feels more associated (in my definition) with specific acts of reading text, values and point-reading from a chart. It is clearly a key component of engaging with a visualisation but not a universal act – when we’re taking an initial ‘at a glance’ perspective, that’s not in my view a specific act of reading.
AUDIENCE would be something I would maybe use in a different written context but is problematic when referring to an individual.
CONSUMER, CUSTOMER, RECIPIENT, RECEIVER and (even) VICTIM are either too passive, too context specific or feel too harrowing.
With the dust settling after the UK elections, a brief reflection on the winners and losers from a data and visualisation perspective:
As I stated a couple of weeks ago, these things are everywhere right now…
Cartograms. So hot right now pic.twitter.com/By1GHCjlbl
— Andy Kirk (@visualisingdata) April 28, 2015
…and they have never been deployed to such good effect and in such an across-the-board sense. During the build up to and night of the election, cartograms emerged as the real star.
— Phil Knight (@PhilipWhere) May 8, 2015
The tracking of the predicted and actual outcome of the election is so well suited to the cartogram approach, sacrificing geographical precision for a more equitable visual weighting for each individual constituency, the voting outcomes of which are so critical to the ebb and flow of the overall election results. This first example comes from the Guardian:
The hexagon, with its reasonably flexible tessellating qualities, provides a great geometric option to build up the election picture, as shown by this in the Telegraph:
Kenneth Field, of CartoNerd acclaim, is working on an interesting looking experiment to take the election hexagon bin map results into a 3D landscape, breaking down the votes of each constituency in stacked hexagon bars, creating the look of the Giant’s Causeway.
Not everything was digital. We had the BBC’s excellent and huge outdoor cartogram (that I cleverly, I’m sure you’ll agree, coined the ‘elecxagon’ map)… It’s excellence was enhanced further by confusing those cretins at the Daily Mail.
— Andy Kirk (@visualisingdata) May 6, 2015
There was some very high calibre visualisation coverage across many different news and media outlets but the standout work (in the UK at least) emerged, perhaps unsurprisingly. from the Guardian, the BBC and the Financial Times. These three organisations are at the top of their game right now and leading the UK data journalism and visualisation landscape.
— Andy Kirk (@visualisingdata) May 9, 2015
— Andy Kirk (@visualisingdata) May 9, 2015
— Andy Kirk (@visualisingdata) May 9, 2015
*There is a nice round-up of some of the election visualisations on BuzzFeed*
Whilst there were surprisingly few examples of corrupt visualisation work, the Liberal Democrats – the big losers in the election itself – offered up the dodgiest data visualisation work, a theme that has continued on from their efforts back in 2010. I’m not saying that their political performance is linked to their visualisation output but…
There have been many recent examples of twitter users taking other peoples’ work and ideas and passing it off as their own on tweets that then generate traffic and attention, blatantly failing to attribute the original author.
Many you will have seen the pattern formed by the predicted GB (not UK, as Northern Ireland missed off) political map compared to Maggie Simpson. I first saw this in a tweet dated 29th April.
Maggie from the Simpsons?! pic.twitter.com/yheS65j0PB
— Ben (@0point5twins) April 29, 2015
This might not be the original, but it was certainly shared enough and predates the endless copycat tweets that went viral after the results came in, with @serialsockthief and @suffragentleman just two of many others who failed to acknowledge where they’d seen the original. Maybe they are unfortunate exhibits to pick on and perhaps they independently came up with the very same idea…
Whilst the Maggie Simpson thing is more comedic than visualisation, there was another example that really caught my attention. This astute piece of analysis by Vaughan Roderick, looks at the patterns of voting matching some of the traditional coal mining areas of the country.
Do I get a prize for this? Distribution of Labour seats compared to England and Wales coalfields. pic.twitter.com/9xeQERU9mR
— Vaughan Roderick (@VaughanRoderick) May 9, 2015
Once again, this has been blatantly ripped off by others without the slightest hint of acknowledgement. @Amazingmaps and @Bowgroup should hang their heads in shame. Particularly as both were told who did the analysis and who should be attributed. Amazing Maps even faved the tweet telling them who the author was!
I appreciate there are character restrictions on a tweet but a follow up tweet with details of where the original came from is surely the least that can be done.
Back in January I claimed that I would be hitting the new year with plans for more frequent, smaller blog posts to offer ‘some practical tidbits most probably relating to quite narrow design considerations’. That lasted for about a week, so its certainly long overdue that I pick this back up.
The small nugget of advice I want to share today is about the relationship between your data and your vision.
Whenever we start a visualisation task there will inevitably be ideas that form in your mind about what this thing might look like. It will be a mental slideshow of different imagery comprising keywords, colours and forms, metaphors, maybe cliches, things that you’ve seen before, things that have inspired you and things that you’ve maybe worked on before.
There is no ‘perfect’ in visualisation: there are better and worse solutions but no absolute path to perfection. It is therefore important to embrace these instinctive reactions we have to the subject and task we’re working on. These mental manifestations inject imagination and creativity into our work and this is important, without question.
However, our ideas only act as initial possible signposts and they should only play the role of background inspiration. They cannot be the leader. We can’t afford to commit ourself to such a narrow aperture in our thinking.
Our ideas are not the raw material, the data is.
Take the example below. This is a piece I’m working on as a demonstration project to accompany the central workflow discussed in my upcoming book. The focus of the project is about the differing career stories of various movie stars. The tentative title is ‘Filmographics’ (that’s a clever wordplay combining films and infographics, in case you were wondering) and looks at the relationship between an actor’s career and the relative success of their movies in terms of critical reception and box office.
When I first had the idea, the very first image that formed was something like the sketch below, captured in my notebook on a particularly bouncy train journey back from London one evening. I had this vision of a forest of trees, with the height being the critical review, the size of the bubbles being the takings and the colours maybe representing the genre.
The reality, when using real data, was that a movie career is not organised in perfect intervals, with consistent reviews and takings: it is up and down, big and small, densely packed and then sparse. There are so many genres, and derivatives, that there aren’t enough colours to suitably distinguish each one. There are things from my initial idea that I can preserve going forward – and that in itself can be quite rare – but the initial idea of that neat forest was quickly shown up by the data to be redundant.
An important discipline you have to show as a data visualisation designer is NOT to be servant to just pursuing your initial idea (or even more starkly important, those of your client/customer). Early ideas and sparks of creativity are really valuable and, particularly as we become more experienced, our instincts are worth tapping in to. Just don’t be precious or stubborn, always maintain an open mind. Ultimately you need to be respectful to the shape, size and conversation emerging from your data. That is the true raw material.
“Good ideas are in abundance. We all have them. Implementations on the other hand, are not. I admire implementations far more than great ideas”, Julian Oliver
It is always a privilege to be asked to give talks and this last week I’ve given two more at the National Audit Office as well as a webinar for Tableau. As many people like to have access to the slide content after each talk I don’t mind at all sharing them.
The focus of this talk is to give people a sense of the different aspects of data visualisation thinking. Separating thinking attributes from raw talent (design savvy, technical skills) I argue that these are fresh ways of thinking about data visualisation that can make such an enormous difference to your capabilities and output.
For people who have read through or attended some of my other recent talks (such as at the ACEhp conference or USF meetup) on the theme of ‘thinking’ about data visualisation, this slide deck has a similar structure but has been further refined and updated.
Earlier this week TheUpshot published a new interactive project visualising the ‘Yield Curve‘. Created by Gregor Aisch and Amanda Cox the work provides a “3-D view of a chart that predicts the economic future”.
It is a terrific piece of work because, as with any good visualisation, it makes understanding accessible, providing a visual explanation of a potentially (at least for me) complicated subject matter.
The most striking immediate feature is the initial 3D display. Whilst the project received lots of deserved praise online I am conscious that being positive about a 3D work might strike some as going against the grain: as we know, 3D is one of the reliable punching bags for visualisation angst. However, I thought it was important to explain why 3D doesn’t just work but is essential in this case.
The first matter is that we have three dimensions of data. When we are lambasting 3D displays, the ire is usually focused on the use of 3D as purely decoration. This introduces an artificial axonometric or isometric projection creating unnecessary and unhelpful distortions to the task of interpreting, for example, the relative heights of bars or angles of pie chart segments. In the Yield Curve project we do have three dimensions of data: we have Year on the x-axis, the % Yield on the y-axis and the Yield Term on the z-axis.
The second reason why 3D makes sense here is not just that we HAVE three dimensions of data but that their relationship is critical to the analysis. How the % Yield alters across the different short- and long-term periods by year is the essence of the analysis. Whilst we could (and the interactive eventually does) decouple these variables to show a range of reduced, two-dimensioned displays, initially we want to get a sense of the overall undulations and contours of the connected dynamics of this relationship. And ‘getting a sense’ is key because you can’t easily or confidently read of the heights of the waves from the 3 axes, that is not the intention, but you do at least get an initial gist of the substance of the situation.
The third reason in support of this approach is perhaps the most important, and it comes down to this little guy:
Having this navigation sequence enables us to look around and beyond the 3D display. Having opened up with the big-picture 3D view next we will want to take different perspectives to observe the different slices of interest from each angle and have a better chance of reading the chart, not just feeling it.
This series of alternative displays is perhaps the crucial reason why something like the below still deserves the criticism for its use of 3D. Without the ability to move around it front on and side on we have to consume a three-dimensional object in a two-dimensional space.
As we click through the series of additional displays, we have seamless transitions pointing the camera at the sides of the chart and from above, looking at the 3 relationships individually. We also have further scenarios and comparisons to consider.
It is a very well considered, brilliantly executed demonstration of explanatory visualisation at its best and an example of when 3D really works.
During the week I posted an article about some of the issues and options around using colour to represent gender (that is, the binary of Male or Female). I included a one-question poll to gather some insights about the attitudes out there towards the use of colour with gender:
Where do you stand on the blue-pink issue? If you were a designer about to assign colours to the gender values in your chart, what choice would you make?
Where do you stand on the blue-pink issue? If you were a designer about to assign colours to the gender values in your chart, what choice would you make?
Here are the results, after 126 responses (50 female, 76 male):
As you can see in the table and chart below, the real difference in thinking concerns the use of pink to represent females. Only 14% of women responded in favour of using pink to represent females, whereas 41% of men would use the blue and pink combination. 50% of women would use a completely different pair of colours.
I guess the main conclusion for me is: if you use pink to represent any female-related data in your visualisation work, almost 7 out of 8 female readers of your work might not be particularly appreciative of that colour association. We spend so much time discussing the issue of adjusting colour choices for the 1 in 10 (ish) colour blind readers, this use of pink finding would seem even more stark.
A quick post to share something that I’ve been really in to this past week or so: dendrochronology. Also known as tree-ring dating, it is the “scientific method of dating based on the analysis of patterns of tree rings”.
The reading of tree-rings is surprisingly fascinating (trust me) and so very relevant to the world of data visualisation.
You might know about the counting of the rings but there is so much more information available than that. Through decoding the size and shapes of the ring patterns, their difference in colour, the blemishes and bulges you get to learn about the history of that tree and the conditions it experienced. It is a great example of story-deriving from visual encodings: it doesn’t tell you the story but if you are sufficiently primed to know how to read it, it gives you the details you need to piece the story together.
This simple graphic nicely explains how to read the tree rings and what information exists. Also, if you can bear to sit through 3m 37s of Top Gear’s James May, here’s a video that explains a little more about trees and dendrochronology.
So why is this relevant? I’ve been reading about dendrochronology for two reasons, both linked to the ‘Seeing Data‘ research project I am part of.
Firstly, I’m always alert to demonstrations of data being encoded through natural phenomena and I’ve been gathering examples for a Seeing Data project workshop I jointly delivered at a school yesterday, where we introduced a group of 13 year old kids to data visualisation.
Secondly, through working on this research – where we are primarily exploring the ‘reading’ side of visualisation literacy – I have become really interested in discovering other applications of techniques for reading and analysing things, such as reading art works or buildings. To my mind, and reinforced through our research, this it where such a big gap exists with data visualisation: not enough people are sufficiently equipped to derive enough of the insights that are being provided in the visualisation and infographics they consume.
At the end of March we will be finalising the research work and will be starting to share the findings we have derived. We will be creating outputs for workshops, presentations, webinars, articles, blog posts as well as a website resource. To keep track of it all, follow our updates on Twitter @seeing_data.
You work your way through the data, come up with an idea for the most interesting angle of analysis, chart your data, and delight yourself with the compelling emerging display: look how clearly it shows the difference between men vs. women! Time to incorporate some further design choices to prepare it for publishing for others…>
And then you come to a grinding halt: What colours should I use for the genders? Blue for boys, pink for girls. Obvious, right?
Well yes, it is so established and immediately recognisable but maybe it is, at best, overly clichéd and, at worst, patronising and offensive in its implication?
What to do!? There is plenty of discussion and increasing awareness around the significance of this issue but no real firm ‘always…’ or ‘never…’ guidance. Therein lies the nature of data visualisation, I guess.
Personally speaking I don’t always quite know how to judge the best colours to use for gender association. I generally tend to try avoid using the blue-pink but maybe in doing so I’m undermining the efficiency and readability of the things I present?
I therefore wanted to try to get closer to some sense clarity about the do’s and the dont’s, and whether or not such clarity can even exist.
As you might predict the ‘lazy’ use of pink has drawn much of the stinging attention (see John Oliver’s take down in particular). Some of the defence offered by the party was along the lines that this colour is simply hard-wired in its association with the female gender.
I’ve not done tons of reading but some quick searching it seems that most sources exploring the how and why this association became established end up in a cycle of inconclusive nurture vs. nature counter arguments. I don’t think we can lean on any clarity from the past or from science either.
There are some sources that suggest that around the time of the First World War the association was actually the other way round. Boys were paired with pink and girls with blue. However, this piece seems to dispel this as most likely a myth.
Going back to the pink bus, the underlying aim of the campaign to “talk to female voters ‘around the kitchen table'” strikes me as arguably the most patronising aspect of this. As my wife astutely put it, “colour doesn’t cause the problem, people’s attitudes do”.
Let’s look at a couple of prominent examples of the pink-blue combination. The classic example would be Martin Wattenberg’s ‘Baby Name Wizard‘ that shows the popularity of boys and girls names over time. It sticks with the classic association of blue for boys, pink for girls.
Last week, the New York Times published ‘The Changing Nature of Middle-Class Jobs‘ showing the gender breakdown of different occupations. The print version used a cyan-magenta colour, the web version used a redder version for the female category. When discussing the colour choices with Gregor Aisch, one of the authors of this work, he explained to me a critical contextual factor that influenced the final choice of colour usage:
The final argument for blue/pink is even simpler. If you want to print thin colored lines in a newspaper you have to stick to a single color tone, as otherwise the lines get blurry when printed as a mix of multiple colors (as the color plates are never 100% aligned). And in CMYK, your only real choices are cyan and magenta. In the web version I changed the magenta to something more red-ish to make the chart more readable for the color-blind.
So what are the alternative colour choices that people might consider using?
Change pink (part 1). The blue is still identifiable for boys leaving just one new colour for the reader to learn (but essentially they don’t have to learn as it is the only other colour).
Change pink (part 2). A darker blue that maybe shifts away from the default ‘baby blue’. Beware the association of ‘men dominating above women’ with this particular choice of layout, as well as ‘values that go under the x-axis are negative’.
Change both colours completely. This avoids the possible negative connotation with the blue-pink meaning but means the reader has to learn new colour associations.
Switch blue and pink! Maybe this goes too far, whereby the reader may not be alert to read the colour key and misread the graphic entirely.
Where do you stand on the blue-pink issue? If you were a designer about to assign colours to the gender values in your chart, what choice would you make? Here is a simple 1-question poll that will hopefully allow us to discover a little more about the attitudes towards the use of colour with gender. (nb. This post is concerned with gender as a binary option. I know that is not reflective of society in 2015 but for the purposes of this post we are just focusing on male vs. female).
THIS POLL HAS NOW CLOSED, THANKS FOR TAKING PART, THE RESULTS ARE SHARED HERE.
A thought for the day. Stick with me on this, I’m still deciding if there’s something in it: I will only know after I’ve published it though so here goes.
I’m musing about this visualisation work that has received a lot of love and attention on blogs and social media over the past few days. Created by the excellent people in the Wall Street Journal graphics team, it portrays data about the impact of vaccines in battling infectious diseases in the 20th Century.
The chart that has had the most impact, due to the highly topical nature of its subject, is the measles chart. This was certainly the image that was drew me in and was used to accompany the many positive tweets it received.
My question is this: do we like the visualisation or do we like the data?
I have a few questions about the colour scheme – it is not explained what the implied threshold of the blue >> green colour means – but it is an attractive looking chart and, in using a heatmap, a suitable choice to display this data. And what a story we see! The shape of the data after the introduction of the vaccine wonderfully demonstrates evidence of its success.
Unquestionably, the visualisation reveals the findings from the data very clearly. However – and this is not a criticism whatsoever – other visualisation approaches would also have revealed this pattern (albeit most would only show an overall pattern than perhaps a state resolution). I would even maybe dare to suggest that the numbers are so compelling that you would ‘see’ the bigger picture of reduction even in a table of raw data.
So, that question again, but framed more broadly. When we like a visualisation like this is it because it is the best way, maybe even the only way, to show a certain pattern in data OR are we actually so engaged because the data that sits beneath the visual has a clear causal relationship between an action and its effect over time – the holy grail of analysis!
What if there was no sudden drop, as shown in the crudely photoshopped version below? Is the visualisation – the design layer – still of merit? Of course you probably wouldn’t produce the visualisation in the first place but, having lost that wonderfully clear reduction in the measle cases in the data layer, does it change our view of the design layer?
I guess it depends on your perspective and what you’re seeking to understand. Yes, I know that’s criminally boring but it is true. If you are pro or anti-vax will certainly influence your stand point on the findings but I’m kind of interested in going beyond those subject matter biases to think about how we evaluate a visualisation’s merits.
Sometimes in workshops, in one of my early class exercises, I might find myself lavishing praise on a visualisation work based on certain design features. Yet the delegates in the room, who haven’t at that stage necessarily developed such a forensic lens, are more underwhelmed by what they are seeing. They are looking through and beyond that visual surface at the patterns of data beneath and are maybe not getting much from it. It is almost like I’m so concerned with assessing the choice and quality of glass in a window I ignore the view beyond.
As a visualisation practitioner, I am often so more concerned with design choices that I rarely find myself looking beneath the design surface. For example, the discovery I treasure most from this project is the super hover action when your cursor sits above the heat map cells and a little marker appears on the colour scale to assist in reading the value. I’ve not seen (or noticed) that device being used before and think it is a brilliant little idea. Measles is going down? Yeah, great, but have you seen the hover marker!
I’ve run out of muse-momentum now so I’ll put down my pipe, and do something else other than contemplating life whilst staring out of the window. Or maybe at it.
One of the increasingly frequent questions I get asked, particularly by people from a scientific or financial domain, is how to effectively visualise uncertainty of data and of statistics. My response is usually to make suggestions around annotated markings and/or colour gradients to indicate increasing or declining certainties.
I’ve been gathering bits evidence for these suggestions and any other sample solutions that might work in different contexts. There aren’t that many but I have compiled some references, papers and examples for anyone interested. If any others emerge I will add them to this list, so if you have any suggestions, please let me know:
1. ‘Information Graphics: A Comprehensive Illustrated Reference‘, by Robert L. Harris. (I love the cigarette and pencil shaped plots here. Harris also refers to ‘Fuzzygrams’ but I’ve yet to get a full copy of his book to reach that chapter).
2. The Bank of England’s GDP fan chart profiled in a paper titled ‘Visualising Uncertainty About the Future‘, by David Spiegelhalter, Mike Pearson, Ian Short.
3. ‘Error Bars Considered Harmful: Exploring Alternate Encodings for Mean and Error‘, by Michael Correll and Michael Gleicher.
4. ‘Visualizing Data Uncertainty: An Experiment with D3.js‘, by Alex Krusz.
5. ‘A Review of Uncertainty in Data Visualization‘, by Ken Brodlie, Rodolfo Allendes Osorio and Adriano Lopes.
7. Case study: Visualising uncertainty on improving-visualisation.org.
8. Some useful comments (one very detailed in particular) on Nathan’s post on FlowingData from 2013 ‘Visualizing uncertainty still unsolved problem‘
9. ‘From here to uncertainty‘, the UK Government Statistical Service’s guide to handling and communicating uncertainty
10. ‘Maternal mortality‘: A Hans Rosling Gapminder video concerning ‘How many women die every year during pregnancy and childbirth? Do we even know?’
11. ‘Communicating Uncertainty in Official Economic Statistics‘, by Charles F. Manski
12. ‘How to Assess Visual Communication of Uncertainty? A Systematic Review of Geospatial Uncertainty Visualisation User Studies‘, by Christoph Kinkeldey, Alan M. MacEachren and Jochen Schiewe