Visualisation of the WordPress User Survey

Last month I shared details of the WordPress 2011 ‘State of the Word’ – founder Matt Mullenweg’s annual overview of the headlines, facts and figures relating to the utilisation of the opensource web/blog tool.

I also shared news that data from the first ever WordPress user and developer survey was being made available and designers were being invited to dig around the 18,000+ records and come up with some insightful visualisation work.

I was therefore delighted to hear from reader Graham van de Ruiz, a designer from Zimbabwe, who got in touch to share a fascinating attempt to visualise this significant pot of data.

(You can also download a 7.5MB pdf version)

I invited Graham to take part in a brief interview in order to find out more about the design process he followed and the visualisation piece he arrived at.

When you saw the blog post sharing the data, and inviting designs, what was your motivation for taking on this challenge?

Data visualisation is a recent interest of mine; my background is more in editorial design. I’ve been teaching myself a few scripting languages after I started with ActionScript last year, and I was looking for a trial project that would help me test and expand these skills through some data visualisation. I learn best when I have a real project to work on, and so I was looking for some data I could play with. The large amount of data meant that I had to write proper code—there was no way I could fudge this one—and the complexity of the data (the survey didn’t follow a straight path) provided an attractive challenge. I also thought that I could get good exposure and feedback through visualisingdata.com if my project was successful.

There is a lot of data in this survey, how did you arrive at a decision to focus on the data covered by your design?

That took a while. From the first question respondents get re-routed based on their answers, and so there are only two questions that all respondents are asked: the first one about how they use WordPress and the last one about whether or not they earn a living from WordPress. I thought this last one was interesting, and I wondered how much more we could find out about those who do make a living from WordPress, and those who don’t. Are there any trends that might be revealled by a visualisation? So these questions were my starting point, and I included a few others based on how relevant I thought they were, how easily and effectively they could be included in the graphic, and how many people responded to them or would be excluded. I didn’t want to leave out a whole lot of data by looking at a question that most respondents didn’t answer, or weren’t even asked.

How did you approach the design of your visualisation? Did you take any inspiration from other designs/designers out there?

I started by just trying to get the scripting working and get something basic on the page. Once I could see it was working I adjusted the parameters for the sizes of dots and started playing with the colours. It was important for me that the colours have transparency and blend in such a way that one can see overlap and build-up, so that’s the first aspect of visual design I tackled. Coming from a print design background, it’s still most natural for me to work on a static page of standard print dimensions, and my main interest with scripting at this stage is to use it for design for print. I’m sure I’ll venture into interactive design pretty soon.

I’ve been looking at examples of data visualisation all over (this site included), so I suppose I’ve got inspiration from a lot of places, though not specifically for this piece. I wanted to keep it original, and so I didn’t conciously base it on anything I had seen.

Can you share some of your early sketches/ideas as you familiarised yourself with the data and its ‘shape’? What other approaches did you consider (but ultimately reject)?

The first thing I had to do was to work out the survey structure, since it branched multiple times. I quickly sketched it out on paper and used lines of different colour to link questions that were the same or similar but on different paths.

I found it much easier to understand when I could see it mapped out like this, and I could see where I might be able to make connections and link the responses.

Initially I wanted to create a sort of flow map that one could use to trace the path of each respondent (or rather, each group of respondents) through all the questions, with line thickness showing numbers of respondents. I did some trial pencil sketches and it looked good for two or three questions, but then suddenly became way too complex. I tried this approach in a few different shapes, just in rough pencil sketches. I realised I had to narrow the focus. Then I saw a circular scatter-graph online somewhere (I can’t remember where) and suddenly it seemed to fall into place. With a scatter graph I could use the two axes, and grouping on the axes, as well as size and colour, and maybe even shape, and trends would appear through the grouping of dots and colour build up.

What software/technical resources did you use to develop the work?

I used Microsoft Excel for looking at the data initially, just to see the form of the questions and answers. As the answers all came in as full text strings, it was difficult to see any patterns or trends. I used Notepad++ to do some find/replace and changed all the answers to A, B, C etc. In this form it was much easier to skim over and it helped enormously in working out the structure of the survey. I could see pretty quickly, for example, that those who answered C for the first question only answered one other question: the last one. Replacing the strings with single characters like this also greatly reduced the size of the file, from 9MB to 800KB. For design and scripting I used Adobe Illustrator with the free Scriptographer plugin. It’s vector drawing software, not really made for data analysis, but I like the challenge of building the visualisation from scratch like this. It means I understand every step of the process and am not working with a limited set of pre-built templates. Scriptographer (and soon Paper.js) is the scripting environment I’m most excited about learning right now, so it made sense to see what I could do with that. I certainly intend to look at other software down the line, though.

How did you decide on the visual properties of the design (eg. the colour scheme, the dimensions, the typeface etc.)

I wasn’t really sure what shape it would take, so I just started with A3 landscape so that I could print it out easily for myself and that I hoped would be large enough to accommodate everything. I was ready to change it if I felt I needed to, but I never did. I wanted the focus to be strongly on the graph, not the labels and other information, so I started with the circles and fiddled until I got something I was happy with, and then used greys and a grey-blue for the rest of the design. The typeface, Azuro, is one I purchased recently and have been eager to test on a project like this. I think the large x-height, narrow character widths and open forms are well suited to information design.

How did you know when to stop iterating the design?

I stopped when it matched the idea I had started with. I think there are ways I could change it and other ideas I might like to try, but it seemed to me that this concept had come to its conclusion. I felt I had learned what I had hoped to and had seen my idea through. Since I did this mainly as a learning exercise, I felt I would rather start something new than spend more time trying alternatives.

What insights are you hoping for people will draw from this work? Any plans to take this project further? Any follow ups?

I was really just hoping to be able to reveal anything about the data to show how effective visualising it can be when looking for patterns or trends. I focussed on those few questions because I hoped we might be able to learn something about who is earning money from WordPress and how they use it, and although what we see is hardly surprising, I find it enormously satisfying to be able to see it so easily with the visualisation. I had a few ideas for simple variations, such as using one circle in place of each block of dots, but I felt I had succeeded in my initial aims and would rather start on something new than keep iterating with the same concept.

Distorted and misleading graphics on Sky Sports

I’d just got comfortable and in place to watch the Sunday afternoon football on Sky Sports, looking forward to having a bit of mental breathing space from all things data visualisation, when up pops these two graphics…

The first graph compares results of QPR vs Aston Villa matches at Loftus Road (QPR’s home ground) and the second shows head-to-head results since the Premier League began in 1992.

Not only do we have terrible 3D bar charts employed but in the first graphic we see a significant bar attributed to a zero value, with the added mistake of the value label clashing with the ‘Aston Villa’ label.

This visual distortion is then amplified on the second with the ‘Drawn’ value of 1 visibly being more than half as long as the ‘Aston Villa’ value of 4.

The designer has clearly decided to combine the category label graphic with the value bars, which confusingly shifts the implied zero axis position to the right of the last digit of the longest label. Whether this has been done to intentionally mislead or is just a case of a bored designer of TV visuals looking to spice up his work, this approach ultimately deceives the receiver.

(Incidentally “Never deceiving the receiver” is one of the central tenets covered in my upcoming Introduction to Data Visualisation training…)

Contribute your City-related designs or information-scapes to new book

Friend of the site Nadia Amoroso has asked me to share details with readers of a great opportunity to have your visualisation projects exhibited in a potential new book about cities as information-scapes.

Nadia, who teaches at the University of Toronto and specialises in research around design, mapping and visual communications, is the author of  The Exposed City: Mapping the Urban Invisibles (Routledge, 2010) and is now acquiring potential contributors for her next book.

She is looking for talented academics, designers and innovators who create compelling “city-related graphics or information-scapes” with the potential publisher looking for about 3 or 4 visually appealing and revealing images from each contributor.

To help you understand and relate to the potential nature of this request, you could refer to the work of the SENSEable City Lab ‘LIVE Singapore!’ project and also an example of the output from Nadia’s Data Appeal software.

If you are interested in this opportunity and have a related portfolio of work you can email links to your designs or thumbnail images to nadia.amoroso@dataappeal.com.

Berg and BBC Dimensions launches ‘How Many Really?’

You may recall about a year ago a new concept was launched for the BBC by design consultancy Berg in collaboration with Max Gadney. This project was called BBC Dimensions and aimed to explore new ways of using digital media to relate stories and facts from history and current affairs. Today, Berg announced details of the release of a new project called How Many Really which allows users to “compare the numbers of people who experienced an event with a number you can relate to“.

The first project to emerge from this work was How Big Really which juxtaposed the shapes of important “places, events and things” onto a user selected map in order to appreciate context. This helped you get a better sense for the true impact of news and events that were sometimes difficult to relate to in terms of their magnitude. This project proved to be extremely popular with a high volume of visitors and a spot at the MoMA Talk To Me exhibition.

How Many Really takes a similar concept, sprinkles a whole new layer of functionality and applies visualisation and visual thinking solutions to complicated or hard-to-grasp numerical contexts rather than geographical/size comparisons.

There are a number of different situations for you to navigate through and several ways to visually explore the numerical comparisons: either by linking up to your Twitter or Facebook accounts to use the quantities of your friends/followers or by typing in your own values. The process then takes you through a series of slides that starts off with the smallest level of quantifiable detail (ie. you) and walks you through different layers of contextual sizes, typically represented by the areas of larger squares, within the subject area chosen.

Here is one I did using my Twitter account friends to contextualise the US conscription during the Vietnam and World Wars:

Introducing ImagePlot visualisation software

In recent times we have seen a number of emerging projects which demonstrate the creative potential of the visual analysis of images and image data. Projects such as Cinemetrics, Movie Barcodes and the 365 Days of Light in Norway demonstrate what can be achieved when the characteristics of images are stretched, spliced, aggregated and transformed.

Now we have a new addition to this fascinating branch of data visualisation with the release of ImagePlot, a powerful, free software tool which allows you to visualise entire collections of images and videos of any size.

ImagePlot was developed by the Software Studies Initiative with support from a number of academic, research and funding partners. The Initiative is led by Lev Manovich, Professor of Visual Arts at the University of San Diego, California, one of the most influential and celebrated voices in the field of digital humanities.

The basis of ImagePlot is the visual arrangement of collections of images as timelines and scatterplots to facilitate the analysis of clusters, outliers, trends and patterns. The data used to form such arrangements comes from the underlying metadata (such as date created, filename) or visual properties (such as hue, shape, brightness), data which is gathered by the macros included in the tool which automatically mine these characteristic from within a collection. You can also use ImagePlot to explore patterns in films, animations, video games, and any other moving image data.

The download is free and available now and you should read more about the tool in greater detail than provided here. The only other tool I can think of to compare it with is the Microsoft Silverlight PivotViewer development.

You can view a Flickr gallery of the early creations developed using ImagePlot and explore a number of articles by Manovich and colleagues that address methodologies for exploring large visual cultural data sets. You should also check out these digital humanities projects which use ImagePlot.

Visualizing.org Marathon, London 2011

The Visualizing Marathon 2011 is a series of 24-hour data visualisation competitions being held in five cities around the world. The inaugural 2010 event was extremely successful and I’m excited to announce that I’ll be one of the jurors at the London Marathon on 12th and 13th November. I’m also very grateful for being invited join David McCandless and Stefanie Posavec in giving a brief talk/workshop during the marathon.

What is the Visualizing Marathon?

It is a free event for students to enter so long as they are enrolled at one of the Visualizing.org Academic Partners. If your university is not yet an Academic Partner you can register now but make it quick as spaces will be snapped up quickly. Registration runs up to 20th October.

The contest works like this. Teams of size 3 or 4 students from the same university will have 24 hours (noon to noon) to design a solution to a real world problem set at the start. Participants will be given access to a selection of open data sets to work with from Visualizing.org (and you can use any other data set you want so long as it’s open). You’ll collaborate with your team through the night (but do remember to sleep) to come up with a winning entry.

At the end of 24 hours you will upload your entry to Visualizing.org. and winners will be selected by me and the esteemed jury. Aside from being a fun event and a great experience, there are some terrific prizes on offer, courtesy of GE:

London Marathon

The London event will be held on Saturday 12th and Sunday 13th November at the Free Word Centre, 60 Farringdon Road, London EC1R 3GA.

The brief workshop I will be giving will take place early in the evening, at about the mid-point of the visualisation work. Details of what I’ll be covering are still a work in progress but will be likely focused on the stages involved in taking visualisation designs from concept through to final execution.

For fans of free food and drink, you’ll be delighted to know that food and drink will be provided throughout the marathon. For fans of running 26.2 miles, you’re at the wrong event.

For any more queries contact Charlene Manuel.

If this all sounds very appealing you should really get registered ASAP and I look forward to seeing you in London on the 12th November!

Sense of Patterns: New visualisation project from Mahir Yavuz

Sense of Patterns is an impressive portfolio of work from Mahir M. Yavuz, a Creative Director and PhD candidate who appears to share his time between New York, Linz in Austria and Istanbul. This is an ongoing visualisation project that aims to “depict the behaviors of masses in different public spaces”.

The project has been developed using python and processing and using hundreds of thousands of data points from the Austrian Institute of Technology. Mahir describes the project:

The visualizations have a focus on the patterns of moving entities in public like commuters, cars and public transportation vehicles as well as the interaction between these entities and physical structures like roads, sidewalks, buildings and parks. The project intends to provide strong visuals on what we all experience in our daily lives in different cities.

The output of the project to date is a series of six A1 size printed posters and a video animation, embedded below, based on data related to Vienna and its suburbs.

You can see the full set of posters on Mahir’s dedicated project site as well as shared in high resolution on Visualizing.org.

Best of the visualisation web… August 2011

At the end of each month I pull together a collection of links to some of the most relevant, interesting or thought-provoking articles I’ve come across during the previous month. If you follow me on Twitter – and now Google+ too – you will see many of these items tweeted as soon as I find them. Here’s the latest collection from August 2011:

All Things D | Vizualize.me Aims to Shake Up the Resume With Data Beautification

Datalicious | Australian Census data visualised with new Tableau 6.1 dark maps feature reveals a severe man drought

Datavisualization.ch | 2011 MTV #VMA Twitter Tracker

Derek Watkins | Posted: Visualizing US expansion through post offices

Design Mind | The Never-Ending Story – Artist Jonathan Harris’s new digital platform aims to help people find a signal amid social media noise.

The Statistics Forum | [Round 1...] Robert Kosara’s Infovis example illustrates the Chris Rock effect…

Eager Eyes | [Round 2...] Information Visualization vs. Statistical Graphics

Andrew Gelman | [Round 3...] Infovis, infographics, and data visualization: Where I’m coming from, and where I’d like to go

Design Process | In this post, I will discuss the similarities between infographics and data visualizations, the differences as well as why it may or may not matter to really understand the distinction.

Eager Eyes | Above All, Do No Harm!

Well-Formed Data | On the role of bacon in visualzation

Fell In Love With Data | The Data Visualization Beginner’s Toolkit #2: Visualization Tools

Flowing Data | Generic terms for streams mapped

Jonathan Stray | Visualizing communities

Drawar | Laws of Simplicity – Law 5: Differences

Flowing Data | Google Map Maker edits in real-time

Pentagram | Inside The New York Times Building

Processing.js | Processing.js 1.3.0 is released

O’Reilly Radar | The nexus of data, art and science is where the interesting stuff happens

Temple of the Seven Golden Camels | They Come for the Frosting but They Remember the Cake

Sexperience Channel 4 | Welcome to The Sexperience 1000, an interactive journey through the sexual experiences and preferences of one thousand British individuals

O’Reilly Radar | Data science is a pipeline between academic disciplines

The Monkey Cage | Does blogging help your professional reputation?

422.com | Britain from Above

ABC News | Australia, a nation transformed

BBC News | 3G mobile data network crowd-sourcing survey by BBC News

The Computus Engine | Links to Temporal visualisation

Design Council | A profile of Margaret Calvert, designer of the UK’s road signing system

Digital Arts | Illustration by numbers: an in-depth guide to creating infographics

Flickr | A History of the War

ONS | Interactive Content from Office for National Statistics

Perceptual Edge | Dyslexics Could Be Our Most Talented Data Visualizers

University of Chicago Press | On this site the University of Chicago Press is pleased to present the first two volumes of the History of Cartography in PDF format

Smart Planet | The secrets to successful data visualization

Visual Complexity | Functional Beauty

Visualizing.org | Visualizing Marathon – A global series of 24-hour student data visualization competitions

Visualizing.org | Q&A with Wes Grubbs

Drawar | Laws of Simplicity – Law 5: Differences

Gizmodo | Watch The Virginia Earthquake Spread Across Twitter

YouTube | Kurt Vonnegut on the Shapes of Stories

Fast Co Design | Adobe Muse Lets You Design Websites Without Knowing Code

Infosthetics | Cinemetrics: Visualizing Movies

vis4.net/labs | Gregor Aisch’s portfolio

Imp Awards | Movie posters gallery

Fast Co Design | How 3M Gave Everyone Days Off and Created an Innovation Dynamo

Jerome Cukier | From protovis to d3

Jerome Cukier | d3: adding stuff. And, oh, understanding selections

Jerome Cukier | d3: scales, and color.

Dataist | Campaign funding times two

Wired | Thermal imaging can be used to steal PIN numbers

O’Reilly Radar | Visualizing hunger in the Horn of Africa

CNN Defining America | Explore the country by the numbers

The Guardian | Food is the ultimate security need, new map shows

New York Times | The New York Times hurricane tracker

MSNBC | The MSNBC hurricane tracker from Stamen

Max Planck Institutes and their connections visualised

The latest fabulous project from the Moritz Stefaner visualisation factory came off the production line yesterday. In collaboration with Christopher Warnow, and other partners, the Max Planck Research Networks multi-touch installation represents the nature of connectivity across Max Planck Institutes and with international partners. It is now on display at the Max Planck Science Gallery in Berlin.

As described on the project’s site, the data came from the analysis of over 10 years and 94,000 publications from SciVerse Scopus. The visualisation brings to life a map of the Max Planck Institutes and how they are connected through scientific publications within and between each institute.

The extra layer of interactivity brought about by the multi-touch environment allows users to pan around the maps to select specific institutes and highlights their most important collaboration partners within the network and across the globe with international partners. The interactivity and transition between views looks so smooth, I also love the subtle circular ‘fingerprint’ created by the multi-touch contact. A further wonderful feature is the use of “streams of energy particles” to conceptually represent flow of ideas being exchanged between institutions.

You can read more about the project on the dedicated Max Planck Research Networks site, on Moritz’s blog and also on Christopher’s portfolio pages.

Data visualisation training workshops schedule 2011

** The content of this post has been published on a dedicated ‘Training‘ page and contains the most up-to-date information. **