Visualisation marathon winner shares his experiences

Back in November 2011 you may recall Visualizing.org ran a number of visualisation marathon events for students to compete in 24 hour competitions. Having attended and spoken at the London event I felt qualified to share my thoughts on the event shortly after.

In February, Enrico published a post on his great Fell In Love With Data blog expressing his general dissatisfiaction with the outcome of the contest overall. I recommend you take a read of this post, the comments and the discussions that it triggered.

In order to get a different perspective on this contest I invited Kyle Foreman, a PhD student at Imperial College, part of the winning team at the London marathon, to share his own reflections on the event.

Background

I’ve come to visualization in a very roundabout way. I was a psychology and neuroscience major as an undergraduate, then I did a master’s degree and fellowship revolving around global health statistics. My job during the fellowship was to write statistical models for predicting global mortality due to specific diseases and conditions (e.g. maternal mortality and malaria). As part of that work, I dealt with a large, multi-dimensional dataset (40M+ observations, stratified by country, year, age, sex, disease, type of study). During model development, every time I made a change I would generate thousands and thousands of graphs to try to evaluate the model performance along all the different dimensions. And if in the process we identified a datapoint that was clearly wrong, we then had to go back to our 40M observations and figure out where the problem had come from.

I quickly tired of making gigantic PDF files every day, so I looked into web based visualizations and settled on Protovis. I had never done web design before, but there were plenty of tutorials and resources online to get started. I eventually built an ugly but functional application for viewing any desired slice of the dataset, mousing over to see all the related metadata for a datapoint, and displaying various model fits over the data.

It improved the workflow so much that creating similar visualization/modeling tools became a major part of my work. And although I started making visualizations out of necessity, I came to enjoy the process of making useful and attractive displays of my results as much or perhaps more than working on the underlying statistics. As such, I’ve been heavily integrating visualization dashboards into my Biostatistics PhD dissertation, and I’ve been entering various visualization contests when the underlying topic strikes my interest.

As for the other members of the team, we had never actually met before the contest. Peter and Cristina are both undergrads in Imperial’s computer science department. Peter has a lot of experience with various projects and consulting jobs he’s done, and Cristina has interned at Facebook. Fan got added to our team the morning of the competition because the other participants from his school had failed to show up. I think his major is Information Design, but he had never done any data visualization before.

The visualization itself

Since we hadn’t met until the day of the event, we hadn’t done any preparation at all. But honestly, I’m not sure how much it would’ve helped – without knowing what the challenge is, it would’ve been hard to lay useful groundwork beforehand.

The first thing we did was spend an hour or so reading through the questionnaire the dataset was derived from, brainstorming ideas, and making sketches; at this point we were mostly doing this independently, aside from asking each other clarifying questions and such. The goal was just to familiarize ourselves with all the data in front of us. Then we spent some time as a group discussing what we found (potentially) interesting in the data – what would make a good story, what could be visualized in a cool way, what was hidden in the gigantic dataset, etc. We batted around a lot of ideas, settling on showing how different demographic groups had different feelings towards the Olympics.

We decided to cluster the dozens of questions into five different categories for a simple reason – so that we could use the Olympic rings motif in the final product. Honestly, the data probably more easily leant itself to 4 categories, but we stretched it to 5 because sometimes the design opportunities are just too hard to pass up. Once we had decided on that, the rest of it just sort of fell into place – a donut chart for each category, colored like the Olympic logo, with a menu for showing different demographic groups.

At this point, we sort of split up – Fan began grouping the survey questions into groups and weighting them, Peter wrote some software to derive all the necessary statistics from the dataset, Cristina worked on the chord diagram which shows relationships between different categories, and I began work on the donut charts. Once Peter and Fan prepared the data, they began helping with coding the visualizations itself.

Cristina and I worked with made up data to start, so that while Peter and Fan compiled the actual statistics we could still be productive. I think our ability to work in parallel like this is the only thing that enabled us to get everything accomplished by the next morning.

The thing that really makes everything work – the animated transitions between the donut charts and chord diagram – almost didn’t come together. We had built the two pieces in parallel, so reconciling them at the end wasn’t easy. I don’t think we got everything glued together until about 10am. That left us with two hours, during which we decided to hack together a quick walkthrough/tutorial, which I think helped clarify to the user what they were seeing.

The event

I think this was the first total all-nighter I had pulled since my exams as an undergrad in 2008 – even then I would normally sneak in at least an hour or two of sleep. While it was good to see I can still work for 24 hours straight if need be, it’s not something I intend to repeat until the next Visualization Marathon. One of our teammates was about ready to fall asleep at the keyboard by about 7am and went home.

The coolest thing about the event was that the groups all had such different backgrounds. There were design students that had very little statistical background but could make the data look absolutely beautiful. Our team was more towards the technical/statistical side of things, so we definitely drew inspiration from how aesthetic some of the projects around us and displayed during the talks were.

Even within my own team, I was exposed to useful new ideas. For instance, I’m a statistician and all my CS/web design skills I’ve just sort of picked up along the way, so I had never seen how a real web developer structures a project. Working with Peter taught me a lot about best practices for web development, which I’ve put to great use in subsequent projects.

My advice for future teams is to come up with a workplan that allows everyone to contribute, ideally in parallel. There’s a lot to be done and only 24 hours to do it in, so organizing yourselves such that everyone can be accomplishing something at once is crucial to success.

Re: fellinlovewithdata.com’s comments

I totally agree that a visualization should be useful in addition to attractive. That’s partly a result of my background (which I rambled on about a lot up there^) – I came to visualization out of necessity as a statistician, so I would rather have a visualization that is a bit ugly but clearly communicates the data than a beautiful design that I can’t make sense of.

I think a 24 hour marathon is a useful adjunct to Visualizing.org’s (and other sites’) other longer term contests. Yes, you can of course do a better job visualizing something over a month than in a day. But the marathon format accomplishes other things that longer contests don’t lend themselves as well to:

  • Getting new people interested in visualization – there were lots of people there who had never done data visualization before. I think many of them were attracted because it was a relatively small commitment (just one day) that had some big names behind it and looked like it would be fun. Is that the best way to learn? Not necessarily. But I think it at least got some new people involved, who can then go on to learn more and get better at it.
  • Bringing the visualization community together – there are plenty of great examples of online collaboration on visualization (such as Visualizing.org’s recent Global Water Experiment sprint), but I think that in-person collaboration does an even better job of providing opportunities for serendipity and passion.
  • It’s fun!