Visualising the Wikileak’s war logs using Tableau Public

Further to yesterday’s post about the Wikileaks Afghanistan War Logs, the Guardian datablog has published a post today describing how their data journalism operation worked. This reveals some interesting insights into the way the investigation team went about handling, analysing and interpreting all this data in order to unearth and present the key stories.

They have also made available a series of spreadsheets containing the data they have used for their various visualisations: summary of casualty data, full list of IED explosions and detailed data behind 300 of the key incidents (needs accompanying glossary of military terms).

I’ve played about with some of this data in Tableau Public to see if I can unearth some interesting visualisations and also to test out the data/software in this environment. I’ve embedded a sample of them below for sharing (please note they do take a while to load up):


This first graph simply plots all types of casualty on a common scale across the 6 year period so that you get a feel for the relative levels for each category as well as any particular patterns within each year. As with all these graphs, the context of the timeline of military strategy, troop numbers and other milestone information would help explain or inform some of these patterns.


In contrast to the first graph, this second one plots all casualities on a single line graph. The approach here is to accept the noise created by the largely overlapping lines towards the bottom of the graph because you can then easily identify unusual peaks such as the huge increase in Taliban casualties particularly during Aug/Sep 2006 and Aug/Sep 2007. It is also clear to see the overall increasing bloodiness of the war over time.


This third graph plots a cumulative picture of casualty numbers by type, clearly revealing the far greater numbers of Taliban casualties. It is really interesting to see the close proximity of Civilian and Afghan forces casualties throughout the course of the war – this graph shows this far better than the individual monthly patterns of the second graph.


This final graph is a heat map used to try and draw out seasonal patterns behind casualty numbers. I decided to use dual encoding for casualty levels with the size of the square and its colour both representing the data count. I felt this helped emphasise patterns more clearly than having just one. Note that the colour and size scales representing different values/maximums in each graph – these ranges are normalised to help comparison of the intensity levels rather than the absolute counts of casualty under each category. As shown in the second graph, you can clearly see an upsurge in activity around the late summer/early autumn periods, particularly in recent years. Casualty levels seem strangely low for the winter and spring months?

IED Explosions



These initial graphs above (the map boundaries haven’t come out particularly well compared to how they looked when created) show firstly the total deaths and secondly the total woundings by location during the 6 year period. Its particularly interesting to see the prominence of locations of deaths or woundings around the highways of Afghanistan, as you would expect given the roadside IED tactics.



The second lot of graphs are small multiples of the death and wounding incidents plots (1) across the 6 year period and (2) by the nature of the event.


This final graphs presents a monthly and yearly plot of deaths and woundings by category of victim and I think it does offer some interesting patterns. Overall these visualisations probably don’t bring a great deal of added insight compared to the original Guardian visualisations, although I think the exercise has served as a good test of the tool as a means for exploring such data.

300 Key Incidents
I tried a few combinations using this data but nothing can really improve on the map interface the Guardian created for this data and, given this is just a manual selection of key events, any statistical or trend analysis will be flawed. I did a wordle word cloud analysis to see if there were any interesting trends on repeated terms but the data contains so much coded language and has references that distort such analysis.

*************************

Paul Bradshaw on the Online Journalism Blog reports that “French data journalism outfit Owni have put together an impressive app (also in English) that attempts to put a user-friendly interface on the intimidating volume of War Logs documents.”

*************************

The Atlantic website has joined in the task of visualising the data, presenting a range of map based analysis for the IED data.

*************************

Nathan at FlowingData has published a guest post by Alastair Dant, interactive lead at the Guardian, describing the efforts that went into designing the war logs map of incidents revealed by Wikileaks.

20 Comments

Jan Willem TulpJuly 27th, 2010 at 2:32 pm

Great visualizations! And also nice to see that you’ve picked up working with the WikiLeaks dataset in Tableau Public.

I am just curious: did you just start playing around and ended up with these visualizations? Or did you have some questions about the data you wanted to answer using Tableau? What are your findings? What are your conclusions?

Andy KirkJuly 27th, 2010 at 2:56 pm

Hi Jan, thanks for your comment. To be honest it was a bit of both – on one hand I was interested to see if there were any seasonal patterns to the casualties (via the heat maps) so that was a specific enquiry, I was then also interested in the cumulative patterns of the casualty levels (and so manually built on the cumulative calculation). The total casualties multiples visualisation was more an interest in seeing how the small multiples could work with this dataset more than an interest in the specific patterns this might reveal.

I’m going to write up some of my findings later, when I’ve completed work on the other datasets, but one of my immediate findings is that embedding Tableau workbooks isn’t always entirely smooth within a word press blog post!

John DawsonJuly 27th, 2010 at 3:45 pm

Good stuff Andy – I was curious to do this so glad you did it first. I’m interested to see the growth rates between months – did the “big surge” or any other strategy really change things!

Neil HoustonJuly 27th, 2010 at 4:02 pm

Looks good,

I did some analysis/display really, of the KIA subset of data:

It’s at http://bit.ly/AfghanKIA ;

agree regarding embedding, mine looks much nicer ‘full screen’. Some interesting pieces around IED and other items too, lot’s of data and as John says it works well when you have a ‘question’ that you want to answer, rather than a ‘play’.

John DawsonJuly 27th, 2010 at 4:02 pm

Have you noticed that the ratio of Taliban to Nato deaths is nearly twice as high on a Friday as on other days.

Andy CotgreaveJuly 27th, 2010 at 4:12 pm

Great work Andy. I’ve done something similar here:
http://bit.ly/bNVyBO

Interesting to see how Civilian deaths always seem to be larger then all the others…. :-(

Andy KirkJuly 27th, 2010 at 4:17 pm

Thanks for comments guys – will come back to you all in a couple of hours

Andy KirkJuly 27th, 2010 at 7:09 pm

Thanks for the contribution John and some cracking Tableau work from Neil and Andy – Andy, particularly like how your dashboard updates all elements throughout based on selections from the first box. I’m just getting into looking at the datasets for the other categories – no wonder the Guardian are inviting (or delegating!) exploration because there is so much data to work with…

Tom TykeJuly 27th, 2010 at 7:44 pm

As usual, the civilian population of the zone shows the second highest casualty rate. Does this demonstrate that most NATO surges or consolidated operations result in the fairly indiscriminate killing of Afghans in general? The stats definitely show that the Taliban and Afghan forces have taken far greater losses than NATO forces. That seems to show that after 6 years of training the Afghan forces are no better than they were in 2004. If they were better able to take control of the country, as prospected by the NATO coalition, the Afghan forces casualty rate should have dropped nearer and nearer to the NATO level but it hasn’t. What it all adds up to is the genocide of Afghani people, independently of the side of the fence they are on, if they are presumed to be on any side in this farce.

Andy CotgreaveJuly 27th, 2010 at 7:44 pm

:-)
Yes, the Guardians fans like to applaud the open access, crowdsourcing, shared approach. Its critics say it’s more a case of getting us monkeys to do their work!

Afghan War Interactive | joelotz.netJuly 27th, 2010 at 8:56 pm

[...] the raw data from wikileaks. With the data provided, we can perform visualizations on our own, (visualisingdata and tableausoftware) to verify the results and [...]

[...] [...]

A.G.August 4th, 2010 at 3:17 am

Tom Tyke, if you read the individual reports, you can see that the categorization of “civilians” is very broad. If passengers of a vehicle full of people open fire on Afghan or Coalition Forces and the vehicle is forced to crash or is blown up, or just that everyone on board is shot by return fire, the only ones counted as the enemy are those they can positively associate and were shooting. The others who might have jumped out and attacked if they weren’t gunned down are considered “civilians” in the reports I viewed.

[...] Pretty interesting post you can find here. [...]

[...] stop motion de um acidente de estrada”, escreveu Shachtman. Ele também ligada a algumas ilustrações interessantes na visualização de [...]

R and ggPlot « Visual SecurityAugust 10th, 2010 at 8:24 pm

[...] [...]

[...] like the Guardian and mash it up in different ways and share how you did it. Crunch the data yourself and post it online. Save it on your hard drive somewhere in case different sites mirroring the information should for [...]

[...] Visualising the wikileaks war logs using tableau public [...]

[...] images created by Visualizing Data and powered by [...]

[...] is worth more than a thousand words: take a look at the WikiLeaks Afghan campaign visualizations at Visualizing Data. Particularly appalling are the numbers of civilian casualties during Ramadan (Aug 22- Sep 21) [...]