In advance of President Obama’s State of the Union address this evening, the BBC web graphics team has published some analysis of the usage of the 10 most prevalent words and phrases in every address going back to 1790.
The selection is specifically based on nouns, adjectives and a pro-noun, with the most popular words being We, Government, Congress, United States, People, Country, Public, War, American and World. These words were arrived at following the application of the following criteria:
The most commonly used words are “the” and “of”, followed by “to”, “and”, “in”, “a” and “that”- and so on. The word “we” is the 19th most commonly used word, and “government” the 30th. We have omitted the word “states” (32nd most common, used 6,560 times) because it is mostly paired with “united” (40th most common, used 4,900 times). Other nouns and adjectives omitted include: “year”, “years”, “great”, “time” and “present”.
The result is a very interesting small-multiples display of line graphs showing the pattern of each word’s usage. Accompanying each graph is a sample presidential quote as well as a brief snippet of narrative provided by Professor Iwan Morgan of the Institute for the Study of the Americas, University of London.
You can see more of the BBC interactive graphics gallery here.
Qualitative visualisations of the State of the Union message have been increasingly deployed as technology has provided powerful new tools and methods to present such analysis.
One of the most prominent examples was, unsurprisingly, in the New York Times and provided interactive comparisons between the 2007 address with some of the previous years’.
Another key designer well known for his qualitative work is Jeff Clark of Neoformix. Jeff has produced several variations of analysis on State of the Union text:
Intriguingly, the White House’s website provides notice that web followers of tonight’s address will receive enhanced coverage of the speech with charts, graphs and additional content available. Will be interesting to see what this turns out to be…
Through an article on Fastcodesign, I have come across BibliOdyssey, a wonderful blog dedicated to unearthing and sharing some of the finest and most unique illustrations to emerge from the 19th Century/Victorian period.
Given this was the age of industrial revolution and scientific curiosity, a significant number of these illustrations are vintage examples of visualisations and infographics.
‘Timetable indicating the difference in time between the principal cities of the World’
Whilst we have seen several examples in Edward Tufte’s books, the BibliOdyssey site’s curator (enigmatically known as PK) has compiled an amazing array of maps, timetables, charts, graphics, and tables from this golden era of discovery.
‘Topographical Atlas of the City of New York Including yhe Annexed Territory’
The beauty of these pieces lies in their incredible technical execution (think of the creative tools we are able to call upon now), the care and attention to detail, the wonderfully subtle and elegant use of colour and the accompanying informative illustrations.
‘Lengths of the Principal Rivers in the World. Heights of the Principal Mountains in the World.’
Edinburgh based developer Steven Kay has created a fascinating calendar wheel visualisation of the seasonal colours in Norway.
The colours are derived from individual frames taken from a time lapse movie by Eirik Solheim of a view of his garden over a single year.
Each frame is compressed into a single column image stretched from the inside of the circle (which is the colour of the ground) through to the outside (which is the sky). The start of each month corresponds with the approximate position of the first letter of the month’s name.
Steven used a custom Processing script to create the individual image strips and combine it into a cohesive display. The effect is a wonderful representation of the variety of colours through the seasons: the browns of Autumn, the emerging winter in November through to April and the vivid greens of Spring and Summer.
You can read more about this visualisation and see others on Steven’s Flickr stream.
(Thanks to Tiago)
In a week that has seen a great deal of debate and coverage of the GOOD chloropleth map, another attempt to innovate a map-based visualisation has landed on my desk in the shape of the FedEx ‘Our Changing World’ visualisation.
The concept of this dynamic visualisation is similar to the idea of last year’s Visualisation of Twitter Happiness with the dimensions of the countries of the world smoothly bulging and shrinking according to the encoded value being presented. The image above shows the proportion of TV imports across the world but there are a number of other topics to select such as business growth, world populations and education.
Similar to my conclusions of the Twitter Happiness method, this display is very ineffective at communicating data, with the countries shape and size being constantly distorted and difficult to draw insight from. It requires exceptional perception of the original sizes of countries and their regional proportions to appreciate and interpret the respective magnitude of their revised areas and therefore the underlying values.
Colour is used in the display to differentiate between countries rather than to communicate any data. A better alternative to modifying the sizes of the countries would have been to encode values using colour and accompanying the display with an inset table listing the top 10 values to enhance the potential insights.
The site also provides an opportunity to engage in a 3D experience based on augmented reality – I have not yet tried this, it requires a printed prop and a webcam but I would be interested to see screen shots from anybody who has!
Not strictly a piece of data visualisation but I can’t avoid sharing this wonderful demonstration of how visuals combined with slick interactivity can provide an extremely compelling and impactive story.
High-resolution aerial photos taken over Brisbane last week have revealed the scale of devastation across dozens of suburbs and tens of thousands of homes and businesses.
You can compare the before and after shots by dragging the slider left and right to reveal the devastation.
A highly effective and revealing method of communicating this story, you can’t help but go through and study each one in detail.
On Twitter over the weekend, a number of visualisation grandees (Mortiz Stefaner, Andrew Vande Moere, Robert Kosara, Enrico Bertini and Noah Iliinsky) have been discussing and debating the rights and wrongs of a particularly unusual chloropleth map published in Good magazine.
Zoomable view here
Typically the infographics presented in Good magazine are inglorious pieces of graphic design being passed off as informative visualisation yet demonstrating very few of the principles that guide this subject area.
On this occasion, however, they have published a design (in collaboration with Gregory Hubacek) which demonstrates an innovative approach to representing three variables of data overlayed onto a geographical landscape. Whether this is the most effective method we’ll reserve judgment for now.
The data in question relates to the US Census American Community Survey and presents data for all US counties for high school graduates (%), college graduates (%) and median household income (£).
To present this data the designer has assigned a colour scheme to each variable (magenta for high school, yellow for college graduate and cyan for income) and then encoded the values for these variables on separate maps to show variation in the saturation of each colour.
To create the final design, he has then overlayed all three colour schemes onto a single map to represent the combined levels of high school graduates, college graduates and median income via a single colour which is a product of the original three. Imagine mixing different levels of blue, red and yellow paint on a palette. The legend below describes this in more graphic detail:
Initially, the result of this is a fairly unintuitive and difficult-to-read graphic. Each county’s colour needs translating backwards using the guide on the left to understand how it should be interpreted. It is, however, an unquestionably interesting approach to tackling the challenge of presenting three variables of data on a map. Furthermore, the difficulty in reading the colours does not imply that the design approach deceives the viewer. That is certainly not a criticism you could level at it.
The comments that have emerged about the design have raised concerns about the easy of perception of the colours and the extent of ‘learning’ required before being efficiently readable and have considered the idea of reducing the variables to from three to two by creating a combined ‘education’ variable from a merger of high-school and college graduates.
Further interesting narrative on this piece is available via Fastcodesign which talks about a method of interpretation where you try to consider the colour element missing more than the colour elements present to arrive some conclusion. The problem with this approach is that it only really works when you have particularly vivid and obvious colour combinations (such as orange meaning there is little blue, purple meaning there is little yellow etc.).
I really enjoy coming across new methods of visual display, especially when it is done in a considered manner like this rather than one purely designed to satisfy aesthetic appetites. The idea of encoding three sets of data using a RGB-mix is very novel. Unfortunately, I think the result is just too difficult to make sense of. Whilst our visual perception is excellent at detecting changes in a single colour, we simply aren’t built to easily detect this across three colour changes.
For what its worth, I think the suggestion to reduce the variables from three to two is an excellent suggestion, bring a better balance to the dataset and lowering the complexity factor. I then believe that taking the display away from a geographical platform and towards a scatter plot would be useful, perhaps colour coding specific region of the US to facilitate geographical conclusions. That display would present an effective visualisation response to the question/hypothesis being posed “are the richest Americans also the best educated?’.
Back in May 2010 the World Bank announced that it was launching a portal to open up access to its vast datasets and creating a platform for developers to make this data more accessible and facilitate greater insights.
As part of the aim to bring greater access to this data, the Bank has been running an ‘Apps for Development‘ competition for “the public to create innovative software applications that move us a step closer toward solving some of the world’s most pressing problems“.
The submission period is now over and voting begins in a couple of weeks time. There are many wonderful entries but I wanted to briefly share a couple of specific developments that have been brought to my attention by Visualising Data readers.
David Schönstein has developed a fascinating interactive visualization titled “Better World Flux” which presents of some of the most important indicators from the World Bank’s open data. The visualisation dynamically presents a fluid, almost organic looking shape which depicts the changing peaks and divergent patterns over time for selected countries against the specific indicators.
Jan Willem Tulp and Joshua de Haseth have used Protovis to create an impressive tool titled ‘Attention areas for the Millennium Development Goals’ (MDG) which presents an interactive, single view interpretation of which countries require the most or least attention across the 43 MDG Indicators.
You can view the fantastic array of creativity, design and technical wizardry via the gallery of application submissions. Good luck to all entrants.
On Flowing Data, Nathan has just posted one of his ‘Visualize This‘ design challenges to apply an improved visualisation treatment to an existing graph.
The graph in question shows the results of poll data from the Pew Research Center, suggesting that the Internet has gained on Television as the public’s primary news source in 2010.
As Nathan observes, this isn’t the worst graph in the world but it is very busy (caused by the excessive labelling).
I’ve quickly crafted up a simple alternative which separates out the data series into small multiples of area charts thus enabling straightforward comparison across each data set. I’ve maintained the same colour scheme but this could be exchanged for a single, consistent colour for all charts so that no single hue dominates disproportionately. The only labels shown are those for the latest year of data.
(click on image for larger view)
Depending on the layout constraints associated with publishing the original graph, the current label sizes and overall dimension may not be ideal but, anyway, this is just a suggested alternative approach.
Incidentally, I think the headline insight from the results is two fold – the fact that Internet is eating into Television’s source of news share (the other two are fairly flat) and that Internet continues to pull away from Newspaper.
This is a follow-up post to my seventh article in the Visualisation Insights series which I published earlier this week. The purpose of this companion series is to optimise the learning opportunities from each insights article, reflecting on the ideas, issues and observations to emerge.
Why did I choose this subject?
As I explained in the main article I first came across Brian when I discovered an article in the Health Services Journal (subscription required) entitled ‘Demystifying data’ in which he was quoted about the importance of the visual display of data. His opening statement that “3d charts are the first refuge of scoundrels” was music to my ears so I soon recognised somebody I would be interested in interviewing!
Aside from this appealing viewpoint I wanted to discover more about the information management challenges Brian and his organisation faced as custodians of the analysis and communication of such important information. As I have suggested, this responsibility makes him one of the most important and prominent information professionals in the UK.
I was interested in learning more about the complexities of making NHS information accessible to the general public, as well as other stakeholder groups, finding out what techniques, methods and technologies they employ to accomplish this most effectively.
Finally, with only limited knowledge about the background and purpose of the Information Centre and wanted to learn how it came to fruition, what impact it was having and what its perceived future trajectory as a public body was.
Impressions prior to the interview?
As somebody who has occupied a variety of analytical roles in UK public service organisations, I am hugely sympathetic to the great challenges that exist with the responsibility of recording, handling, analysing and communicating data relating to such activity.
The unique and vast nature of NHS activity makes data and information management more complex than almost any other context, but it is also more crucial to get it right.
My initial impressions from reading up a limited amount of material about Brian and from exploring the Information Centre website was a very positive one. I was encouraged by the attitudes and principles Brian seemed to be championing and the transparency, rigour and accessibility of the Information Centre’s online statistical provision and functionality.
Impressions after the interview?
My first observation relates to Brian’s strong statistical background which contrasts with some of the other Insights interviews I’ve conducted where we’ve seen marketing professionals, journalists, programmers and designers, to name but a few. Statistics would probably be considered the most traditional of pathways followed by many of the most prominent names in the visualisation field, such as Tufte and Tukey.
The status of the Information Centre is particularly interesting, as it has been established for a fair number of years now and appears to be maturing into a second phase of purpose, evolving from the provision of statutory information and official government statistics, and moving towards greater emphasis on what is useful for the NHS and patients:
A large proportion of the Information Centre’s workload involves producing official statistics for central government to help develop and account for health policy. However, as Brian points out, since its inception, the organisation is moving more and more towards a focus on what is most useful for the NHS and Local Government in order to help improve their services. This is a hugely positive trend.
Increased centralisation and, therefore improved efficiencies and thoroughness of processes, will be a boost to the organisation, potentially creating new capacity to engage in more exploratory information that will help achieve this focus towards the NHS and public. Particularly, there is a great effort on improving information on quality of service measures which are very difficult indicators to construct.
It is really refreshing and encouraging to see Brian’s appreciation of the true value of information and its potential in helping to enhance the quality of care the NHS provides. He recognises its role in the organisational system and the importance of keeping it as a by-product of care activity, not making it the activity itself:
“Information is not a free good so there is an increasing need to make best use of it, making it work in combination rather in isolation and joining up a single cohesive story… The information tail should not wag the service dog.”
The challenges of communicating information about health care encapsulates the motives behind effective visualisation – striving to create clarity and perceptual access to a subject that is complex. It is not about diluting or dumbing-down the complexity of a subject, but making it more accessible through simplicity and elegance of communication.
I am particularly fascinated to see how the Information Centre embraces the potential gold mine that is qualitative data emerging in unfathomable quantities from social media sources such as Facebook and in particular Twitter. This could be a huge area of exploration for his team to see how sentiment analysis can be used to identify levels and variance of perception of NHS care.
My final reflection concerns the overriding message I picked up in our conversation that the Information Centre is constantly looking to improve its offerings, enhance its capabilities and optimise its value and influence:
“If a lot more of the information around the NHS was more readily available and was being used and understood, especially by the public. That would be a success. We will have also developed better measures for the quality of care. But overall, we will have succeeded if the Health system was beginning to use the information effectively to improve its services. Data is for information which is for improving public services.”
Many thanks to Brian for agreeing to meet up with me and take part in this interview – it was a pleasure to spend time in his company chatting about visualisation matters. Also for his patience in waiting ages for me to actually transcribe the interview and convert it into a (hopefully) accurate portrayal of our discussion. I wish him and his colleagues at the Information Centre all the best for the future.
Look out for future insights articles, with many interesting interviews and interviewees lined up…
Just wrapping up my promotion of the Strata ‘Making Data Work’ Conference taking place on 1-3 February.
Last week I launched a quickfire Twitter raffle for one lucky person to win a free full pass to the Strata conference. Thanks to all who entered by re-tweeting or mentioning the post.
Entry closed at 12:00 UK time and after a colleauge randomly drew a name out of a hat (it was actually a cup) I have pleasure in announcing that Jérôme Cukier (jcukier) is the winner and recipient of the free pass discount code – I wish him a very enjoyable conference.
Commiserations to those who missed out – time is starting to run out before the event kicks off but you can still benefit from a 25% discount off the registration fee by using code str11vsd or by clicking here.
Unfortunately, I can’t attend the event so for those who will are lucky to be going I thought I’d share my thoughts on what I see as the best plan of action regarding the schedule with a recommendation for some of the great visualisation-related talks that will be held (when you have to make a choice between concurrent sessions). Enjoy!
Tuesday 1st February
Make People Fall in Love with Your Data: A Practical Tutorial for Data Visualization and UI Design Interfaces – Ken Hilburn (Juice Analytics), Zach Gemignani (Juice Analytics)
“Water, water everywhere, nor any drop to drink.” – Rime of the Ancient Mariner. People feel overwhelmed with data. But the problem is not with the amount of data. The problem is that data is not presented in a form that people can understand and use. Juice Analytics will present and demonstrate proven techniques to design information applications to present data in enjoyable and rewarding ways.
or… (maybe sneak out halfway?)
Data Bootcamp – Joseph Adler (LinkedIn), Hilary Mason (bit.ly), Drew Conway (New York University), Jake Hofman (Yahoo!)
This tutorial offers a basic introduction to practicing data science. We’ll walk through several typical projects that range from conceptualization to acquiring data, to analyzing and visualizing it, to drawing conclusions.
Communicating Data Clearly – Naomi Robbins (NBR)
This tutorial describes how to draw clear, concise, accurate graphs that are easier to understand than many of the graphs one sees today. The tutorial emphasizes how to avoid common mistakes that produce confusing or even misleading graphs. Graphs for one, two, three, and many variables are covered as well as general principles for creating effective graphs.
Wednesday 2nd February
Telling Great Data Stories Online – Jock Mackinlay (Tableau Software)
Interactive visualizations have become the new media for telling stories online. This session will focus on going from a good visualization to a great visualization by focusing on organization, user interface, and formatting. You should expect to leave this session confident in your ability to consistently create excellent interactive visuals.
MAD Skills: A Magnetic, Agile and Deep Approach to Scalable Analytics – Brian Dolan (Discovix ), Joe Hellerstein (UC Berkeley)
A discussion of Big Data approaches to analysis problems in marketing, forecasting, academia and enterprise computing. We focus on practices to enhance collaboration and employ rich statistical methods: a Magnetic, Agile and Deep (MAD) approach to analytics. While the approach is language-agnostic, we show that sophisticated statistics can be easily scaled in traditional environments like SQL.
Small is the New Big: Lessons in Visual Economy – Kim Rees (Periscopic)
While the majority of charts were designed to handle a variety of data, there is a certain novelty of presenting data in a very succinct way. By designing a presentation method restricted to specific data points, we can realize an economy of space and interface.
Big Data, Lean Startup: Data Science on a Shoestring – Philip Kromer (Infochimps)
How do you build a crack team of data scientists on a shoestring budget? In this 40-minute presentation from the co-founder of Infochimps, Flip Kromer will draw from his experiences as a teacher and his vast programming and data experience to share lessons learned in building a team of smart, enthusiastic hires.
Visualizing Shared, Distributed Data – Roman Stanek (GoodData) Moderated by: Alistair Croll
“Many hands make light work”, as the saying goes. That’s true when thousands of people can collaborate on a data set. In this session, we’ll look at collective interfaces that allow many distributed users to examine and share data with one another, and how that’s changing traditional desktop visualization tools.
or… (another sneak out halfway through?)
New Developments in Large Data Techniques – Joseph Turian (MetaOptimize)
Certain recent academic developments in large data have immediate and sweeping applications in industry. They offer forward-thinking businesses the opportunity to achieve technical competitive advantages. However, these little-known techniques have not been discussed outside academia–until now. What if you knew about important new large data techniques that your competition don’t yet know about?
Google Cloud for Data Crunchers – Patrick Chanezon (Google), Ryan Boyd (Google), Stefano Mazzocchi (Google, Inc.)
Many of the tools Google created to store, query, analyze, visualize data are exposed to external developers. This talk will give you an overview of Google services for Data Crunchers: Google Storage for developers, BigQuery, Machine Learning API, App Engine, Visualization API.
Unleashing Twitter Data for Fun and Insight – Matthew Russell (Digital Reasoning Systems)
This talk demonstrates how an eclectic blend of storage, analysis, and visualization techniques can be used to gain a lot of serious insight from Twitter data, but also to answer fun quesions such as “What does Justin Bieber and the Tea Party have (and not have) in common?”.
Avro Data – Doug Cutting (Cloudera)
Apache Avro provides an expressive, efficient standard for representing large data sets. Avro data is programming-language neutral and MapReduce-friendly. Hopefully it can replace gzipped CSV-like formats as a dominant format for data.
Thursday 3rd February
Data Journalism: Applied Interfaces – Marshall Kirkpatrick (ReadWriteWeb), Simon Rogers (Guardian), Jer Thorp (The New York Times) Moderated by: Marshall Kirkpatrick
After Kennedy, you couldn’t win an election without TV. After Obama, it was social media. But tomorrow’s citizen gets their information from visualizations. In this panel, three acclaimed designers show how they apply visualization to big data, making complex, controversial topics easy to understand and explore.
Realtime Analytics at Twitter – Kevin Weil (Twitter, Inc.)
Most analytics systems rely on large offline computations, which means results come in hours or days behind. Twitter is all about realtime, but with over 160 million users producing over 90 million tweets per day, we need realtime analytics that scaled horizontally. This talk discusses the development of that infrastructure, as well as the products we are beginning to build on top of it.
AnySurface: Bringing Agent-based Simulation and Data Visualization to All Surfaces – Stephen Guerin (Santa Fe Complex)
Live demonstration of ambient computing using projector-camera pairs to scan the room and place interactive simulations into the space. All surfaces are rendered interactive. We will demonstrate a 3D sandtable for firefighter training and STEM education where the 3D sand becomes and interactive surface.
Beyond visualization: Productivity, Complexity and Information Overload – Creve Maples, Ph.D. (Event Horizon)
We will discuss the impact of the information explosion, the effectiveness of current technological directions, and explore the success that new perception-based, human-computer interfaces provide in analyzing and understanding complex data. Real examples will be used to illustrate that effective man-machine environments are essential in productively dealing with multi-dimensional information.
Data as Art – J.J. Toothman (NASA Ames Research Center)
Artistic visualizations and infographics tell the stories of rich data in unique, compelling ways and synthesize datasets in ways that allow them to be interpreted, absorbed, and experienced in ways beyond the spreadsheet, pie chart, and bar graph.
Predicting the Future: Anticipating the World with Data – Christopher Ahlberg (Recorded Future), Robert McGrew (Palantir Technologies) Moderated by: Alistair Croll
Data doesn’t just show us the past—it can help predict the future. Several new firms harvest massive amounts of open data, trying to anticipate everything the right ad placement to the next terrorist attack. In this session, we bring together the founders of these firms to discuss the technology—and ethics—of looking into the future.