This visualisation project explores the evolution of male and female medal winning performances across all Olympic Games since 1896. It portrays the patterns of improvements in the results of time-based events where speed is the measure of success, whether it be on foot (track athletics), in water (swimming) or on water (rowing and canoeing). By choosing a sport and selecting an event you can see how Gold, Silver and Bronze winning times have changed over the years, for both men and women, as they strive for that ultimate pursuit of being faster than the rest.
In this follow-up post I wanted to explain more about the design process and decisions behind this project.
Purpose and Parameters
The background to the project is framed by the contest and the opportunity to collaborate with a good friend of mine, Andrew Witherley, who is a skilled programmer and somebody with whom I have worked for a number of years in the past. We've been looking for an opportunity to work on a piece together in a non-client environment and so this was ideal.
The contest's brief was very open - “We challenge you to use data and design to visualize the Olympics, helping us understand and enjoy as we watch” - and so this gave us a blank canvas to decide on the data story we wished to focus on.
There were a number of ideas flying around but one of the most fascinating (and achievable) directions surrounded the curiosity of how much faster people are these days compared to athletes down the years. How have enhancements in coaching, technology, sports science, training and physique influenced the improvement in performances?
And so the initial purpose of this project was to create an elegant, exploratory interface that allows users to learn about how medal results for different speed-based Olympic sporting events have improved across every summer games in which they have been held.
In terms of the influencing parameters surrounding the project, the main factor was the timescales for the contest. Whilst it was open for about 5 weeks, we really only had chance to work on things for the latter two weeks and even then it was rather irregular. The intended target audience would really be anyone with an interest in the Olympics but specifically those with a more analytical slant looking to learn something new rather than just digesting eye-candy - after all the Games would be (and has been) flooded with so many visualisation and infographics pieces.
Acquisition – With a clear idea of what we required, the best source of athlete, result and event based data was found on the excellent databaseOlympics, which has huge repository of data for all Olympic Games from 1896 to 2008. To extract the data we imported it from the source URL using Google Spreadsheet and the formula '=ImportHtml("http://www.databaseolympics.com/games/gamessport.htm?g=47&sp=ATH"; "table";0)', repeating for each year and sport.
We extracted data for all sports that involved time-based measurements for the results, rather than those that are based on points, distance measurements or tournament style contests. This narrowed down the sports to: Track Athletics, Canoeing, Cycling, Rowing, Swimming and Triathlon.
Examination – An inspection of the data revealed a few issues that we would have to deal with early. The different sports and the events within each sport have changed over the years so there were some inconsistent datasets. We decided to eliminate any event that wasn't held in 2008 and those that have been held fewer than on three occasions (as there wouldn't be much comparable data to work with).
Also, there were pockets of records for which no results were recorded or the results were estimated and a few that seemed questionable in accuracy (perhaps input errors of wrong times or years). Another issue was the challenge of handling time data (especially formats) which always seems to bring so much baggage with it as it moves from one tool to the next. This was particularly a problem with the problem of dealing with the date 1896 in Excel and Tableau (I understand this can be resolved but it did cause a problem initially). A further problem, which materialised too late to undo, was the failure of Google's Spreadsheet import routine to cope with non-standard text characters such as á or ö in many athletes names.
Transforming Data – a certain amount of data cleaning was undertaken to resolve some of the quality issues identified above, except for, at this stage, the athlete name problems (which affected 400+ records) - due to the time restrictions it was decided that this would be tackled later if possible.
Further transformation was undertaken to put the data into shape fit for analysis and fit for connection to the web. This included forming a calculated field of the result time in seconds to facilitate the creation of an 'index' which would reveal the % improvements across an event based on changes compared to the slowest result. This index therefore enabled comparison within and between events and sports. Team-based events (with multiple medal winners) were combined in to a single record.
After all the data treatment we were left with just four sports: Athletics, Canoeing, Rowing and Swimming.
Consolidation – We decided not to acquire any other data but we did seek to consolidate our raw material with a bank of images for country flags and event posters, which would be intended for use in the final design.
Visual Analysis – This was conducted using Tableau to familiarise with the range and distribution of values for each event/sport, understand design factors such as axis parameters, learn about which events were for both genders/just one and generally learn about the data in a more visual setting.
There were a few curious results (such as the men's 3000m Steeplechase) which, after investigation, were down to inconsistent distances being run in some of the earliest events. The most confusing data surrounded some of the Canoeing events which revealed time inconsistencies around the 2000 Sydney games (much slower) but there were no apparent answers for this so we decided to leave them unchanged.
We were now very clear about the physical properties of the data we had to work and an early concept for how each variable would be deployed was soon apparent: Year (axis), Country (for analysis and detail for annotation), Sport (selection), Event (selection), Gender (filter), Athlete (detail for annotation), Result Time (axis, detail for annotation), Medal (data marker, series, filter).
There were several different ways of slicing our analysis and visualising the data on this subject so we needed to demonstrate a clear editorial focus. Some of the best visualisations are based on the type of questions that might pop up in a pub conversation between friends.
We wanted to create an exploratory tool that allowed users to pursue queries such as “I wonder how much faster the 100M medallists are now compared to the past?” and be able to find out further information about all the relevant events, how the patterns compare for the different medals, what % improvement in time has been experienced, which events have improved the most and the least, who were the athletes who won the medals etc.
Additionally, we wanted to facilitate comparison between the progress of men and women: is one improving more than the other? Whilst we knew from the data that women were slower compared to men across the events, were they showing evidence of getting closer to men in any event?
Data representation - The data representation was intentionally straightforward, presented using the visual variable of position on a common scale to maximise the clarity of the visualisation. The markers would resemble medals (albeit transparent) and coloured accordingly, then each point would be joined into a line chart to show the pattern and progress of improvement. Below are a couple of the early concept sketches.
The x-axis would be the years of the Olympics games and the vertical y-axis would be the result times. The main decision made here involved the plotting of smaller values towards the top. Whilst this is counter-intuitive for most displays of line charts, actually for issues of speed it is more natural to think of better as being higher. Metaphorically it also 'feels right' that the faster results appear to reach the x-axis before the slower ones, as if you were looking at the finish line.
We packed the results into a frame based on a dynamic axis which just had a buffer of 1% above and below the fastest and slowest results.We experimented with different medal sizes and line widths to get a balance between visibility and resolution of the data.
There was a technical barrier differentiating in marker shape between male and female results but because the times were always so different (and women's results consistently slower than men's) this didn't prove a problem ultimately.
As well as the data on the line chart, we wanted to include details of the races for users to lookup in a pop-up box and provide a range of summary statistics to form some deeper analysis of the events and across the sports:
- The top 5 countries to win medals in each event - presented using a simple look-up table with medal icons
- The % improvement of each event and its most comparable events (by gender) - using a panel of small multiples to aid efficient comparison
- Top 5 largest % improvements of events (by gender) - look-up table
- Top 5 winning margins in events (by gender) - using a horizontal dot-plot
Colour and background - The colour scheme was designed to be a very limited palette, with little or no decoration. The obvious temptation for an Olympics piece is to follow the colour of the five rings but this didn't mean anything of worth for this design. Most of the background is pure white and the fonts and chart apparatus are subtle grey. Initially a concept for the title was created based on a montage of all the Olympic Games' posters but this proved to be unnecessarily distracting.
The colours of the medals were used for the line/medal markers on the main chart and a non-patronising (the lazy use of blue vs. pink) palette of dark blue and orange was chosen for the summary stats to differentiate genders. The grey-shaded panels just helped to frame the individual charts without battling for attention with the data.
Interactivity - The main interactive features involved allowing the user to change their sport, event and data selection, with toggle filters for medals and genders. Additionally, a hover (athlete, result) and click (pop-up of race details) event was programmed for interacting with the main chart. Here's an early technical version to get the concept working initially:
In an ideal world, with extra time, we intended to create a sort button to switch between 'by year' and 'by fastest time' view and we would have liked to have incorporated a visual 'ruler' when you hovered over a marker to create a horizontal line allowing you to see the faster and slower results easier.
The summary statistics and a view of the raw data were included using a neat concertina device to show/hide these extra elements.
Arrangement - The main decisions around arranging the design involved whether to go for a portrait (years on the Y, times on the X) or landscape layout (years on the X, times on the Y). We correctly (in my opinion) settled for the latter. We also had to decide where to position the selection tabs and filters, eventually placing them across the top. All this would be packed into a central web page dimension of 960px wide to maximise screen compatibility.
As mentioned above, gridlines, tickmarks, axes and axis labels, borders and tabs were all reduced as much as possible in terms of their visual prominence.
Annotation - The annotation was important to maximise the user assistance. We included a brief introduction to allow people to get on with using the visualisation but then included a further 'about' page with much more detail about the function, intention and background to the project. The font throughout was 'Myriad Web Pro' apart from the title which was created using 'Kozuka Gothic Pro'.
Construction and Evaluation
In terms of the tools used to construct this piece:
- As already mentioned Google Spreadsheet was used to extract prepare the data
- Excel was used for further cleaning and transformation
- Tableau to conduct the visual analysis and create the summary statistic graphic elements
- Not all browsers were unfortunately compatible so it targets HTML 5 compatible browsers including Chrome 20, Firefox 13, Safari 5.1, Internet Explorer 9.
Aside from the time pressures, in terms of judging ourselves when we felt it was completed we established the following criteria to measure our own satisfaction:
- Have we reasoned our design choices effectively?
- Have we created accessibility through intuitive design?
- Are all visual properties deployed with deliberate meaning and serving a purpose?
- Is the form and function appropriately balanced?
- Is the reward of reading the visualisation greater than the effort to read it?
- Have we checked the accuracy of the data?
We felt we had achieved all these reasonably well. Then we considered the criteria set out by the contest:
- Understanding (10 Points): How effectively does the visualisation communicate? How well does it help you make sense of this issue?
- Originality (5 Points): Are the approach and design innovative?
- Style (5 Points): Is the visualisation aesthetically compelling?
Instinctively, once again, we feel we will have delivered on the first (important) issue of understanding. Originality in this piece isn't particularly achievable because the visualisation method (ie. essentially a line chart) is the best way of conveying our data story, and so any alternative approach would have potential lost this message. The style is really a personal judgment but I think this conveys a certain elegance through its clarity.