Things will be quiet round here for a short while…

So long as the ash cloud patterns stay consistent with the forecasts, I’m really excited to be making my first visit to California and spending the next couple of weeks on a bit of a road trip from San Francisco down to Los Angeles, via the coast route.

So things will be quiet round here for a short while with no blog posts for the next two or three weeks. However, there is plenty of good stuff on here if you are a new or recent visitor to get your teeth into.

Check out the growing list of essential visualisation resources (more parts to follow on my return), the recent visualisation project on the Arab Spring or some of the most popular posts listed on the site’s sidebar.

Where possible I will be following my twitter feeds and will pass on some of the best content so if you’re not already one of the 1,600+ why not join my happy band of enlightened followers @visualisingdata!

New visualisation design project: Protests and the Media

I’m delighted to share details with you of a challenging visualisation project I’ve been working on in collaboration with the team at the engine room. The title of the work is “Protests and the Media” and explores the recent uprising in the Middle East and North Africa, termed the Arab Spring. This data visualisation project specifically looks at the relationship between the severity and extent of each country’s protestations, the consequence in terms of government response and the level and spread of coverage in the media.

(If you can’t see the embedded closr.it widget above, please click here)

 

The closr.it viewer above allows you to explore the full graphic. A rapidly constructed, semi-interactive web version is also available to provide an alternative route in to this analysis.

Christopher Wilson, who leads the engine room team, has published an excellent blog post and visualisation page on his site which accompany this post and provide much more background to the purpose and motivation behind this project, the process of identifying the key indicators and the task of gathering the data. He also provides narrative on the insights to emerge from studying the resulting analysis.

The aim of this post is to share with you the design process that was pursued, explaining some of the key decisions made and the design choices that formed the finished work. I am going to structure this around the three key themes that shape any visualisation project: message, data and design. A fourth theme, which relates to the constraints and restrictions around a project, runs throughout and so is incorporated within the others.

 

Message

Showing due consideration to the issue of  ‘message’ is possibly the most under-appreciated dimension of a visualisation task – the clear defining of the motivation and purpose for the analysis and what it is you are attempting to communicate.

As I’ve said in my introduction above, the key purpose behind this project was to explore the relationship between media reporting and severity or seriousness of the uprisings taking place across this region.

We wanted to determine what role the media was playing in the reporting of the protests and government responses. Was the breadth and volume of coverage consistent with the relative magnitude of protest and government response? If we looked across a diverse range of media agencies would we discover dis-proportionality in the focus of reporting, which could suggest evidence of editorial bias, geopolitical interest and sectarianism in the coverage?

This would be a complex ‘message’ to construct, not least because of the relationships that existed (or not) within and between the data. The size of protest and severity of government action would be a fairly subjective matter based on a likely combination of several different indicators and data collections.

Creating a satisfactory visualisation solution, one which offered a cohesive and accessible route into being able to understand the situation, would therefore prove a difficult challenge.

 

Data

(For true insight into this dimension I would recommend you read Christopher’s blog post and visualisation page which provide a detailed and interesting account of the challenges associated with identifying, sourcing and gathering the data.)

For this visualisation task I was provided with the following data, captured for each of the key twenty countries involved in this story (you can access the dataset via this Google Spreadsheet):

Context

‘Severity’

Media coverage (all online reports and English language only)

There were also a number of immediate calculations provided such as the maximum number of protesters as a % of population, arrests as a % of protesters and deaths as a % of protesters. This would help standardise the data and aid cross-country comparison.

There were three key challenges relating to the data element of this project.

Firstly, we were battling against the clock. As a live, evolving and ongoing story, data recorded about the ‘Arab Spring’ would be constantly changing and its quantities increasing. Just getting access to the data and validating it was hard enough (for the researchers, that is, I can’t take credit!) without having to accept the need to eventually draw a line under the data collection and produce the visualisation quickly to ensure it was still sufficiently ‘current’.

(Incidentally, the nature of this challenge has provided a real glimpse into the conditions that exist within the world of visual journalism – I take my hat off to all you who succeed in such a high-pressured and relentless, yet creative, environment!)

Secondly, the ‘severity’ data in general was quite problematic due to the wide distribution of values and the presence of significant outliers within each indicator, such as the size of the Egypt protest and the volume of Libyan arrests and deaths. This variation would restrict some of the visualisation methods that could have been used.

Thirdly, if we recall the purpose of this visualisation, it was to reveal (a) the patterns of media coverage (b) in the context of the seriousness of the uprising and (c) the nature of the response from the governments. Essentially, that would be two quantitative, one qualitative and one categorical variable (of country). However, the data we had would not conveniently map on to this simplistic model. It is important to remember that you can only make a visualisation as simple as the data allows, anything else can only be achieved by diluting the data communication.

A sense of ‘severity’ could only realistically be formed by subjective means, with the reader forming their own opinions about the extent of uprising, revolt and reaction each country had experienced by scanning the indicators separately. This meant the visualisation design would have to be opened-up more than was originally conceived, but in doing this it would allow the data to breath and the communication to flow better.

 

Design Process

Sketching and exploring – My initial scoping work involved sketching out possible layouts and structures, constantly remind myself of the need to create a solution that responded to the requirements outlined by the project’s purpose. The interplay between the data for severity, government response and media coverage was key so the design would need to create a cohesive story between these elements. Sketching different layouts would allow me to judge the potential sequencing and positioning of the different visual elements.

Complementing this stage, coming at it from the opposite direction, I always begin projects by thoroughly exploring the data, understanding its characteristics and visual potential. The most efficient and dynamic tool currently available for doing this is Tableau, which offers outstanding flexibility and ease of use to greatly help experimenting and digging around your data set.

 

Establishing limitations – In becoming familiar with the data I was able to identify the outliers, as mentioned above, and get a real feel for the sorts of physical properties each data element might offer. My aim was to try and achieve a static design, one which could survive the conditions of such a live data exercise as this, particularly as I didn’t have the time or resources to invest in developing a dynamic visualisation using one of the main programming languages or packages.

It was clear early on that there was going to be no simple or effective way of overlaying these distinct data clusters into a neat, single solution. For example, the media coverage data, in particular, would require the use of small multiples to exhibit the different geographical reporting trends between agencies.

Most times you will find that trying to combine multiple indicators and previously separate graphical displays on to a single display loses far more than you gain. With this design I was not looking to simplify the data any further than it allowed.

It is important not to underestimate the potential of the viewer to make global sense of distinct visual components, the key was to make these accessible and intuitive in their interpretation. Sometimes you have to invite the viewer to do some work to draw insights and this is an occasion when that attitude is necessary.

Having established clarity on this matter I was able to focus on the three sections of data almost as separate challenges.

 

Media Coverage Data – The first data I tackled was the media coverage. The task was to present the trends for each media agency’s reporting characteristics and drawing out inconsistencies. There were two visualisation approaches to achieve this: 1) choropleth mapping and 2) simple plots against an average. These would be combined into an overall trellis of small multiples, allowing the eye to immediately pick out patterns and trends across the whole.

A key decision required here was to choose whether to focus on the absolute or proportions of reporting. I decided that this wasn’t a judgment of media power or resources (for which the BBC, for example, would stand out) but more about the profile. I therefore standardised all the data to reflect the percentage of reports for each country out of each agency’s total.

To create the maps I imported the data into a Google Fusion Table and merged it with a data file which had KML data for displaying polygons of all the countries in the world. I generated the map and customised the display, such as modifying the choropleth value range from 0% – 50% (the maximum proportion of any country) and using color brewer to deploy a best practice colour palette. I kept the value displays to an opacity of about 75% so that you could still make out the country labels underneath.

I then customised the map using the Google Fusion Table Layer and Google Maps API to reduce the colour domination of the terrain, remove some principality labels and finally remove the distraction of the water. With the basic map design complete, I ran the output for the eight different agencies and cropped the images in Photoshop. The final task here was to manually add in labels for some of the very small countries that are difficult to spot.

The bar charts were generated using Tableau. I set up a dual axis chart with the individual media agency (bar) plotted against the overall average across all agencies (the orange dot). This would reveal any inconsistencies in coverage against the typical profile.

One of the key decisions to make was the sorting order of the country labels. Alphabetical is the traditional sequence but my aim is to make every visual decision carry a meaning. I recognised that the focus was on media coverage so the countries would therefore be best sequenced in descending order according to the % proportion each country had been reported on overall.

To satisfy the appetite of viewers wishing to see the absolute values for the media reports, I set up a separate Tableau worksheet for this field, exported both the bar plots and this table as images and connected them in to a single graphic in Photoshop.

The overall effect created by the small multiples seems to work quite well.  The eye is shown to be extremely efficient at being able to spot patterns and differences across these displays, on the maps seeing instantly the darker reds for Egypt/Libya with the western organisations through to the greater balance with Reuters, the dominance of Syria and under-reporting of Libya/Egypt with Al Arabiya, and then in more detail the relationship between each media outlet and the average.

 

Severity Data – For the severity data I tried a wide range of methods (scatter plots, heat maps, log axes) and even considered excluding the outliers to allow the other values to achieve more prominence, but all were ultimately unsatisfactory and ultimately the good old bar graph or just a straight table of values were left as the only viable options.

Separating the absolute (eg. Max protesters) and contextual indicators (% of population) into separate columns was more advisable than trying to merge into a single display. This separation facilitates the process of interpretation much better. Nothing could help bring greater prominence to the lower end bar values, which were difficult to read (hence why I included the numeric values as well), but ultimately a small value will always be a small value – the lack of a substantial bar for these lower end bars and the difficult of seeing their values is itself a message.

 

Concession and Repression – This was a fairly straightforward task, based on a blank image of a Tableau table and using Illustrator and Photoshop to create an array of simple icons to represent the concessionary and repressive actions each country’s government had taken. They were coloured to depict the positive (green) and negative (red) responses, with the regime change an overall outcome represented by a green bar. The difficulty was really about how and where to position this graphical element overall. Ideally it needed to be conveniently visible alongside the severity data to help form that subjective view of where you might expect media reports to be focused on – the greater the repressive tactics or concessionary actions, the higher profile the story, you would expect? In the end it was placed to the left of the severity data, as the first part of the story of each country’s experience.

Final Visualisations - The final visualisation was compiled by bringing together all the different ’tiles’ and layers into a single Photoshop graphic. This was quite an intricate process because there were many different elements coming together and my obsession with layout accuracy meant everything had to be positioned with exact pixel-perfection. The final piece was generated as a jpg file and then published using closr.it to enable viewers to explore the large graphic without sacrificing resolution.

In order to give viewers an alternative way of experiencing the data an accompanying interactive was hurriedly put together, using fairly crude html/asp programming. With more time and resources this would have been a much more sophisticated and immersive development but it does at least let you see the detail of individual media outlets in closer proximity to the severity data.

 

Final Reflections – Hopefully, the design provides an effective solution to the challenge set. It allows the viewer to make subjective judgements themselves about the seriousness of each country’s experiences and compare this understanding with the media coverage patterns and the extent of government response.

As with any design challenge the problem context (the aim of the work) and the range of variables and data structures you’re working with will always dictate the design approaches that are feasible and which ones are not.

 

Over to you?

What can you do with the data? The solution described above is but one approach to visualising this data. If you have creative, alternative ideas, and a bit of time on your side, why not access the dataset via this Google Spreadsheet and have a go. We’d be really excited to see some different designs to take this debate further, especially if you can bring them alive through interactivity.

 

Thanks

Finally, I’d like to thank Christopher for approaching me to join this project and congratulate him, Alix and all the other collaborators who worked so hard (and so late) to carefully source, validate and compile the data. It was a very good team effort!

Mapped visualisation influencing European response to ash cloud

As somebody anxiously following the latest situation of the Iceland volcanic ash cloud, I find it fascinating just how much emphasis and influence the visualisation output from the UK Met Office is having on the response.

Whilst other information is of course being gathered and consulted, the overlayed ash cloud data on this map design is having a massive impact on the response from airlines, airports and aviation authorities. It is playing out like a form of sport for those of us with our finger’s crossed that the red zone moves on from our flight locations.

That’s not to say the graphic is perfect, far from it. A few features could be improved and made a little more elegant. Especially required is a thorough tidying up of the visual confusion caused by the clash of lat/long gridlines, airspace boundaries and the country borders.

Washington Post ‘Where are the jobs?’ graphic

Nice piece of work from the Washington Post graphics team, presenting an effective small multiples/heatmap fusion to demonstrate how the financial crisis has impacted on a range of industries across the US in terms of job numbers.

Each horizontal sections presents the ups and downs of the different industries. Each strip is a month’s increase or decrease in job numbers %. The decreases are encoded using an ever stronger saturation of purple and the increases likewise but in green. As the two shades reach their peak the two darkest levels are a little hard to differentiate but that’s only a minor quibble. The only thing I didn’t care too much for was the useful but rather intrusive narrative which appears over some of the graphics.

This is a static representation of an accompanying interactive map that is published with the main article. The interactive works well enough but there is something much more elegant and satisfying about a single static graphic that manages to capture the essence of a complex story and communicate it really well.

Nice work from Kat Downs, Neil Irwin and Alicia Parlapiano.

Win an iPad2 in the latest visualisation contest

One of the key indicators that demonstrates the growth in popularity of data visualisation is the increasing frequency of visualisation contests. I’m delighted to share with you details of the latest competition, a team effort between Postgrad.com and David McCandless where one lucky winner will take home a brand spanking new iPad2!

 

They have kindly invited me to join an esteemed judging panel to pick the best visualisation relating to a topical issue about the population of black students attending elite UK universities.

Background

Back in April, Prime Minister David Cameron and Oxford University were involved in a public spat about the issue of how many black students attended the University. The PM said there was only 1 black student, Oxford described this was inaccurate and misleading and in fact there were 26.

This debate provided sufficient motivation for prominent data journalist and information designer David McCandless to pursue the data that would get to the bottom of the issue. However, having collated much of the data he ran out of time to complete a visualisation design and instead decided to share the dataset and open up the chance for visualisers out there to have a go…

The visualisation brief

We are challenging designers and analysts to explore this subject matter, identify your version of the truth and present this in an effective visualisation. Go through the data provided and any dig out any further useful resources you can find to create a compelling, engaging and informative visualisation which communicates the true experiences of black students applying for and attending the UK’s elite university system.

There are no restrictions on the nature of the visualisation, it can be a simple static, interactive or a full blown infographic – just choose whatever creative output provides the best platform for your capabilities to present the story you have to tell.

Whether you are fresh to visualisation or already well-established, this competition represents a level playing field and anyone can win. For information on the judging panel click here and scroll down about half way – I’m the suave looking chap at the top with the head shot I had done for the local barbers.

 

The dataset

The data David collected can be access by this Google Spreadsheet and is accompanied with other data sources listed in the file as well as those listed below, covering University student and applications, giving you the opportunity to combine and contextualise the topic and maximise the potential of your communicated design.

Guardian DataBlog post on Oxbridge Elitism

UCAS Annual Datasets

You are free to utilise any other dataset you discover provided the sources are publicly available and appropriate citations are included in your entry.

 

Why should you enter?

Here are a few good reasons why you should enter this competition:

 

Admin, timelines, rules etc.

To enter the competition, simply email your visualisation as a jpeg attachment or a link to a post or site elsewhere to mark.johnstone@postgradsolutions.com and include your full name and the best email address to reach you on.

The competition is open now and closes at 11pm GMT on Monday 20th June 2011. Winners will be announced by Monday 4th July 2011. The competition rules are posted here.

Good luck to everyone!

SoundAffects: A musical and visualisation interpretation of NYC

SoundAffects is an experiential project collaboration between mono, Parsons The New School for Design and Tellart. The purpose of this ten day project is to capture real-time city data and translate this in to a musical and visualisation representation, helping us explore and think about cities in new ways.

The hub of this New York City-based project is a listening wall which is set up at Fifth Avenue and 13th Street. Behind this wall are a bunch of sensors tracking variables such as traffic patterns, proximity of people in the space, colour, temperature etc.

The data from these sensors is assigned to colour and sound values to create an abstract timeline-based visualisation of patterns and notes representing the ebb and flow of the city.

As well as being presented on the wall itself, the web-based visualiser can be viewed here and several one-off experiment videos can also be discovered here. For more details about how the SoundAffects project works, to find out about which sensor inputs correlate to which colors and how the visualisation and sounds are compiled and generated, have a read of the explanatory infographic below (click to enlarge).

As the project’s data capture grows, the project team are making it available to the public and you can download daily csv files to generate and share your own creations.

To follow the progress of the project follow @soundaffectsnyc and #soundaffectsnyc on Twitter.

Eurovision song contest visualisation

Pablo from Spanish developers Undefined has sent me details of a new project he and his colleagues created for the Radio Televisión Espanola website. This interactive creation visualises Twitter activity surrounding the Eurovision song contest, which took place last Saturday 14th, as well as the historic patterns of competition voting and final rankings.

Developed in Adobe Flex, the project is separated into three sections:

1) Map: As shown above, this feature plots the Eurovision finalist countries on to a map of Europe and represented by flags as circles sized according to the quantity of Twitter mentions.

2) Ranking: This presents a simple ranking of the votes in the contest for each year, including a display the Twitter and Facebook popularity counts.

3) Who voted who: This layout displays which countries voted for which other countries. By clicking on a particular country’s flag, the dynamic display positions it in the centre of a network diagram surrounded by all the other voting countries. The thickness of the connecting lines/spokes between countries represents the size of each vote received in blue and issued in return in red. The main idea here is to see at-a-glance the predominant colour and determine if a particular country has voted higher or lower than other countries. You can navigate through the past 11 years of contests to discover the historic patterns.

Tableau European Conference – Freakalytics update Day 2

This week Tableau are holding their inaugral European customer conference in Amsterdam. I’m delighted that Stephen and Eileen McDaniel of Freakalytics have provided key updates from the event and have kindly allowed me to share their observations.

On Tuesday I published the headlines from Day 1, published below are updates from Day 2′s main event. These observations are a direct lift from the Freakalytics ‘Thoughts’ page which published these updates straight from the conference floor.

Please note, some comments are the opinion of Freakalytics and not necessarily those of Tableau. This content was live blogged; there may be occasional errors or omissions.

 

Day 2 Schedule (note many are concurrent sessions)


8:30 to 9:30 | The Disinformation Age: Technology is Making us Stupid

with Stephen Few, Perceptual Edge

Overview

There is no way to easily weave data into knowledge…
Are we enlightening our audience or frustrating our audience?
Data and the use of data are the sexy jobs of the next decade

A video with Hal Varian at UC- Berkeley and Google was shown in an interview

Pie charts by Homer, the real reason everyone loves pie, it isn’t the ability to see the data…

 

Skills and tools

We must have good skills and good tools to achieve the possibility of data enlightenment

Bad tools can imprison us- so many tools seem to assume that we are dumb and we just want entertainment.

A plethora of flashy, uninformative and even misleading charts available from many companies

The Business Intelligence industry has delivered many of the tools to date
BI SUCCESS! in collecting data, cleaning data, transforming data, integrating data, storing massive amounts of data and reporting on data

BUT, traditional BI has hit the wall- we can’t explore our data, easily analyze our data, clearly communicate our findings or easily use it to predict the future

WHY?
Traditional BI is very engineering and feature oriented
NEW BI needs to be much more human-centric and design-oriented. We must understand how people see, perceive and use data to effectively serve them in their quest for better decisions.

 

Data visualization

Data visualization is powerful because it weaves numbers into information

In the 1700’s, William Playfair invented the line, bar and pie chart. He had a bad day when he invented the pie chart! Well, 2 out of 3 innovations that are useful isn’t a bad record!

Many of the new, shiny graphs are much worse than traditional, simplistic graphs at explaining the situation. This is because they misdirect, mislead and often can’t inform us.

Stephen then showed a clip from “The Onion“- concentric circles hitting misshapen areas.  Parody on earthquake reporting.  Lots of talk and graphs with no insights about what is actually happening in the earthquake location.

Stephen then showed a Fox News example showing supporters of various Republican nominees adding up to 190% of audience!

When you add up the slices in this pie chart, you will find that 193% of the electorate was polled!

 

The process

Search, discovery, examination, understanding and making decisions. Stephen calls this Search, Examine and Explain or “SEE”

Note that vision is often our dominant sense- “I see!” 70% of sense receptors are in the retinas of your eyes.

Trends, patterns, outliers- a picture makes it stand out. Can easily see patterns such as seasonality (domestic), overall trends (domestic up and international flat) and exceptional outcomes (international in August).

We should attempt to balance thinking with visual power- traditional methods put much of our burden on thinking, but evolution has made the visual system exceptional over the thinking part of our intellect.

Confused?

Brick wall is uninformative, reflective light on pie chart is deceptive, minimal value from pie chart with two categories

Sexy graphs are fun but often not useful or even misleading! DON”T bury the truth under layers of makeup, but rather choose simplicity in your graphs to inform.

A quote from Edward Tufte, “Above all else, show the data!“ Tufte argued that data ink should be high relative to non-data ink

  1. Reduce non-data ink
  2. Enhance the data ink

The objective we should strive for is to make the situation clear and simple.

An example, avoid distracting displays in your presentations.  Reflections in graphs are a great example of wasted data ink.  In the real world, reflections in the outdoors are something we find annoying!  Why did developers build it into graph tools!?!?

Stephen then showed 3-D bar charts shown that make it nearly impossible to read.  It was a graph from major BI vendor documentation manual.  They have added a third dimension when it had no meaning or purpose except to confuse.

Avoid visual puzzles- this is not a game, we are trying to make the best decision.  Decisions could involve the future of your career, your company, your bonus or even people’s lives.

Save the pies for dessert, not your presentation.  Also, pies on a map are useful since they are self-contained and bars or lines are not self-contained.

Very bad bar chart with unneeded third dimension

 

Sense-making

Referred to his latest book, “Now You See It

We must bridge the gap between data and knowledge which should be built on an understanding of how we see and how we think.

What is the question?  Organize the data appropriately.

Stephen showed a simple example demonstrating that visual perception is not just camera work. Your eyes do NOT work like cameras!  The CONTEXT influences our perception; data with poor context is misinterpreted quite easily.

A 2nd example- gradient of fill colors misleads you in bar and line charts. Then showed dots versus dots connected by lines (budget versus actual data.)

A good example showing how it is hard to read more than a few values in the table but easy to compare the two series as lines. With the lines you can see the overall, upward trend in domestic traffic, the lack of trend in international traffic, the seasonality of domestic traffic and the exceptionally low results in international traffic in August.

Some techniques that tools should effortlessly enable include

 

Visual analysis at the speed of thought

See-> Think -> Modify iteration again See -> Think -> Modify and so on- the flow of thinking that leads new discovery and insights.

To achieve this we must eliminate distraction and augment our limited working memory.

An example for the University of British Columbia Visual Cognition Lab- we are easily distracted!  Too much noise exists in our world of visual analysis due to poor software design.

How it works, from the World -> Working memory -> Long-term memory

Imagination can also feed into the working memory

We can only hold so much information in our working memory, 3-4 chunks of information based on extensive research since the 1950’s.

There are visual aids for working memory, so we can quickly see and understand a lot at once to aid our limited working memory. An example,

1 data point = 1 chunk
BUT one line with 24 data points = 1 chunk in working memory, suddenly you can see and easily compare 5 regions across two years instead of 5 regions for one month!

Another example, the story of three blind men and the elephant- tree trunk, snake whipping around, like a huge fan.  One felt the leg, the tail and the trunk.  They could only see a small amount.

Unfortunately, many data analysts are like the blind men.  They have only been trained in a limited, directed way –OR– their tools impede their ability to explore and understand the data!

 

Where should be headed?

Information -> Knowledge -> Wisdom -> Which leads to a better world and life

Our ultimate goal is not knowledge, but rather wisdom.  To make better decisions in the world.

Stephen then closed with a poem by TS Elliott.

Tableau European Conference – Freakalytics update Day 1

This week Tableau are holding their inaugral European customer conference in Amsterdam. With a wide range of hands-on training, top quality keynote speakers, one-on-one expertise opportunities and in-depth break-outs its sure to be an excellent event. The much anticipated release of Tableau 6.1 will also be showcased.

I’m delighted to say that Stephen and Eileen McDaniel of Freakalytics – an impressively knowledgeable Tableau Education Partner – who are both present and presenting at the conference, are providing updates from the event and have kindly allowed me to share their observations.

Published below are some live blogs relating to Day 1′s key sessions or events. These are a direct lift from the Freakalytics ‘Thoughts’ page which hosts these rapidly published blog updates straight from the conference floor.

Please note, some comments are the opinion of Freakalytics and not necessarily those of Tableau. This content was live blogged; there may be occasional errors or omissions.

Day 1 Schedule (note many are concurrent sessions)


 

11-00 to 12:00 | Scaling and Performance Best Practices

With Dan Jewett, VP of Product Management, Tableau (Dan and Stephen previously worked together at Brio in the late 90′s).

Overview

1) Prefers that most customers see it as a black-box

Content from Desktop
To Tableau Server
People then access your content via no-client AJAX technology (like GMail)
HTTP(S) access via Apache web server, please leave web server alone if at all possible!

VIZQL and web app meet your requests, they work with a security layer to use

Holding this all together is a repository with user management, content management and a search engine (Lucine project ?name? by Apache)

2) Hardware- don’t be thrifty here, this makes a big difference in the user experience!

For a modest investment in hardware, you can radically increase performance and capacity
Dell server example- $14k US buys 8 cores, 64 GB memory, 3 TB of fast SCSI (RAID 5); but $38k US triples the CPUs, 128 GB RAM, 4 TB of RAID
Assume with 100 users, this tripling of server capacity adds perhaps 10% to project costs but much better capacity at peak load times (e.g. – Monday mornings when everyone logs in)

Windows Server 2003 or 2008, you want 64 bit OS and hardware!

MEMORY- more is better!  I would trade some CPU’s for more memory if you have to trade off.

Fast disks matter, a lot- RAID config is better
Lots of data extracts can mean need for high storage capacity

Can be hosted on a virtual machine, but physical machines typically perform better (based on anecdotal feedback).

Tableau web clients accessing the server are chatty with the Tableau Server system, so there is a moderate amount of networking speed needed between the two.  However, there are many little requests, so network latency can be an issue.  Consider co-locating smaller servers worldwide instead of one big central system. Viz’s render as lots of little tiles with JavaScript for interaction.

3) Distributed components- a scalability strategy not a performance strategy (allows lots of users, but not faster viz’s!)

Primary TS machine in the cluster- it is the Load Balancer for client requests
Add worker machines, even moving data engine off to worker machines
Can even move data engine off the client worker machines, simplest way to increase performance- the data engine is the big memory user!

Caching is per process- distribution can actually diminish performance since the caching is not shared amongst machine or processes.

Keep your machines in the same subnet if possible
Firewalls and DMZ’s can slow network communication, sometimes significantly depending on how locked down your systems are…

4) Caching

Request comes in from user at web client.

  1. Fastest- created this view before and in cache, if so, just send cached images- no queries or calcs needed!
  2. If no cache image, then do I have the SQL query in cache?  If so, use cached data to render view and send to user.
  3. If no cached query, hopefully database is fast and can quickly send results.

Three cache control strategies in the Server config dialog

Model cache size- how many viz’s to cache (4 views in a dashboard, that is 4 caches) – 30 is default- should be much higher on server!  (100-200???)
Query cache- size in MB of query results to cache- 64 MB is default- should be much higher on server!  (2,000-4,000 MB on a 64 MB Server)

How many server processes per server core?  2 VizQL and App per core is a good start.  If caching is critical, then add more VizQLs. If your server is lower-end, reduce to one per core.

5) Database or extract?

Database- live data needed or want database security based on users.
Data extract- faster (unless you have Teradata, Vertica, Netezza, etc.), prefer data changes in reports on a predictable schedule; can handle security by user filters when published or Tableau Server view restrictions.

Note that there are many ways to optimize extracts including hide unused data items, filter based on dashboard need, data aggregation by visible dimensions and used level of date details.  Stephen has seen extracts reduced by 50-99% with these methods.

6) Better server performance

Real-time virus scanners can kill system performance on the Server, consider nightly virus check instead or restricted scanning

Server timeouts can be optimized- default session release is 4 hours per user request

To prevent runaway queries, Tableau is default config terminates queries lasting longer than 30 seconds.  You might set this lower or higher…  This setting also impacts scheduled extract refreshes.

7) Workbook optimization

If your workbook is slow on the desktop, it will be slow on Server!
Large workbook file size on desktop should be examined for optimization and removal of unneeded elements
Smaller workbooks are better
Custom bins can be slow
20-80 worksheets in a workbook- avoid this if possible
Tabbed views are slower to render than single views
Large crosstab views can be very slow (10-1,000 pages of crosstabs)


 

13:30 to 14:45 | Developers on Stage: The Premier of Tableau 6.1

with Chris Stolte, Dan Jewett and Francois Ajenstat, Tableau Software

Overview

Mission of dev team- “Help people see and understand their data.”
Make data fun, anathema too many, but key to using data throughout the organization and entire decision-making process

Four key areas of investment for 6.1

  1. Data performance
  2. Sharing via mobile optimization with iPad
  3. Localizing and globalizing the products- French and German
  4. User experience- make every step easy, fast and fun!

Data architecture

Live connection to any data source that is already working at your company
- Continue to invest in this approach, very important

Unfortunately many people have data everywhere- Excel, CSV, text, tab-delimited, Access, some data marts, etc.  We want to make it easy for these people to also use their data!

1st example- 500 million words from Google books project, examining use of various words.  Results came back on his desktop in 5-10 seconds with 500 million records over 70 years.  NO DATA WAREHOUSING, just load your data into the extract!

Realized that you might want incremental additions to Tableau Data Extract.  Dynamically loading Tweets on conference.  Chris had data through 11 AM, but it is now 1:30 PM, he told the extract to refresh just data since 11 AM using the date time field.

You might have data from your data warehouse. But it isn’t uploading as frequently as you would like.  Tableau can add data to the extract from –another– source, not just the original data source!  e.g. – Monthly files are standard from database, but I have a critical weekly addition from a comma delimited file.  Can easily add it to the extract.

Many other new data features in 6.1

Localization and Globalization

German and French now available in their language

Also expanded geocoding,

User experience

Pin and unpin from start page, clean up start page

View Data everywhere- data connections, custom SQL, at top of data pane- a commonly requested feature by accountants, financial people

From View Data you can now pick just part of the data- for example just some people’s names instead of all the data in the columns

Next feature from web site forums- refresh all extracts in workbook, new command on Data menu instead of individually selecting them

Improved pan and zoom on maps and charts

Links for dashboard images

Author control of legend layout

Dark map style with black background

iPad dashboards and more!


15:15 to 16:15 | Deep Dive into Time Series Analysis
with Meredith Dicks, Tableau Software

“Watched part of Meredith Dicks excellent talk on Time Series Data in #Tableau. No live blog since it was packed, nowhere to sit!”


 

Thanks again to Stephen for sharing his first day’s observations, see tomorrow’s post for day 2 highlights…

New app visualises your vehicle and driving activity

Since 1996 cars have been built with on-board computers to capture and store a wide range of diagnostic and performance information. Until now this information has largely been the preserve of mechanics and manufacturer garages, until now…

Thanks to a new device from Griffin called ‘CarTrip’, you can tap into this data and draw insight about the performance of your car and your driving using the accompanying ‘Clean Drive’ iPhone App which visualises data on your driving efficiency, carbon footprint and vehicle performance statistics.

How does it work?

CarTrip plugs into a vehicle’s OBD-II port, then sends data to the CleanDrive app running on an iPhone or iPod touch. CleanDrive monitors your car’s performance, collecting data like fuel consumption, acceleration, top speed and engine diagnostic codes as you drive. CleanDrive crunches the numbers and displays your “Carbon Score” in an easy-to-understand format on your device’s screen. Instantaneous trip and long-term averages are recorded to give you a clear picture of how your driving habits impact the environment and the efficiency of your car over time.

CarTrip costs $89.99 and is available now, the CleanDrive App is a a free download available for iOS and (soon) Android. Similar to the growth witnessed in diagnostic and visualisation tools for domestic energy data, this is likely to be a growing market for unlocking the significant potential of vehicle and driving data.