Most people reading this post will be very familiar with Nathan Yau’s popular FlowingData blog. For several years it has existed as one of the absolute go-to hubs for visualisation content, news, tutorials and exhibits as the field has enjoyed a remarkable boom, evolving from a somewhat fringe pursuit to something which now penetrates the mainstream. Last September, Nathan announced that a FlowingData book was in the pipeline. Two weeks ago ‘Visualize This’ burst on to the book shelves, literally, as it raced into Amazon’s top 100 best selling books! Having gratefully received a copy last week, and after an initial read through over the weekend, I thought I would present a brief review of the book, offering some perspectives about its focus, content and value to visualisation designers of all backgrounds.

  Purpose The straightforward purpose of this book is to give readers a guide on how to create visualisations. As Nathan explains in an earlier blog post, describing his brainstorming process “It finally dawned on me that there should be a book on how to actually create and design data graphics. Like, really how… Lots of examples with real data, different tools, and thoughts on design along the way.” At its heart this is an example-driven book, which builds on some of the most popular content found on FlowingData, specifically the well-explained tutorials which take everyday data visualisation problems and work through methods for solving them. The main hub of content aims to presents accessible visualisation approaches and methods depending on the data and the story you’re looking to tell or unearth. Whilst referring to design concepts throughout, the book’s content leans towards a useful, practical focus more than it does theoretical. Its less about the why and more about the what, how and when. It is aimed at those readers who are willing to open up new design opportunities through the capabilities of visualisation programming languages. If you're a visualisation designer unwilling to embrace the challenges of picking up new coding skills, maybe you should look elsewhere (after you've had a long look in the mirror, that is...).   Content The flowchart presented on Page xxvi gives a really nice (and visual) feel for the intended flow of the content. It is organised in a way that encourages you to either read it cover to cover or just drop in on a specific chapter: a structure which makes these type of ‘how to’-focused books so digestible and useful. The content can be characterised by two distinct clusters – chapters concerned with establishing context and a foundation understanding of data visualisation in one cluster, with the practical methods and tutorials in the other.   Introduction & Chapter 1 (Telling Stories with Data) These introductory sections effectively set the foundation of the book and the value of visualisation.  I imagine these were the most difficult chapters of the book to nail – introductions usually are – but they really help contextualise the subject and draw the reader into the book. One of the most interesting aspects of visualisation is to hear about how people arrive in this field, particularly as it exists at a convergence of several diverse disciplines. Like traveling to any city there are many different routes and modes of getting there. The origin of Nathan’s own journey into visualisation is statistics, and before that electrical engineering. This appreciation of statistical rigour, clearly heavily influenced by John W Tukey, and an eye for sequential processes are prominent themes. It’s fascinating to read how, prior to Nathan’s visualisation awakening (to frame it rather dramatically), he viewed “statistics as pure analysis, and data as the output of a mechanical process” (page 2). It was during his brief but pivotal internship at the esteemed New York Times’ Graphics Department (could there be a better internship, anywhere?) that Nathan developed a greater appreciation towards design and the need to tell the story about the data. Working for the NYT taught him how to report data rather than just produce a graph. Here it was about taking data beyond statistics and analysis, towards a concise explanation of data that helps a reader make sense of real life. Rather than just being a single continuum between objective journalism/analysis and art, entertainment and compelling categories are proposed as further dimensions, though. I would see this less about a separate space and more to do with subject matter – the methods of presentation will be the same as in the other two. The Telling Stories with Data chapter sets out with an explanation of the nuances between deploying visualisation for objective analysis contrasted with visualisation for more artistic purposes. It offers the further categories of entertainment and compelling visualisations. Personally speaking, I would consider these to be characteristics of the purpose and/or subject matter, rather than a further distinct category of visualisation. Semantics aside, I really liked the movie analogy to help explain the types and applications of visualisation. You get boring documentaries, you get inspiring and informative ones. You get great entertaining movies and you get trashy ones. Furthermore, what is the story you’re trying to develop – is it a report or a novel, is it to entertain or inform, is it to motivate and inspire or engage more casually? Taking the idea that data points could be considered to be characters, consider their history, their present and their future. Consider their character development and how do they interact with other characters and in situations, how does the plot evolve, how do you begin the story and end it? This is a really nice way of conceptualising any visualisation task at hand and leads to what is the strongest central concept of the book – always let the data do the talking. Indeed, looking back at Nathan’s comments about brainstorming, he explains “to me, the nerd statistician, data takes centre stage, and everything else feeds off of it”. One snippet that I really loved at this point in the book was the use of a histogram to double-up as a graphic legend – a superb little idea that helps transform the depth of a design instantly (page 14).   Chapter 2 (Handling Data) & Chapter 3 (Choosing Tools to Visualize Data) Chapter 2 concerns the critical task of data handling. It covers important considerations around gathering and formatting data, introducing a wide range of options for obtaining data, including a helpful demonstration of python script to automate the task of scraping data from web sites. It then moves through a variety of tools to format and refine your data (such as Google Refine, Mr People). Part of me thinks there was a chance to go in a bit more depth here about the challenges surrounding initial data checking and exploration, assessing its quality, identifying its range and diversity, learning about the data types, applying cleaning methods – generally preparing it for analysis/visualisation. Chapter 2 briefly mentions the potential issue of typos and Chapter 6 talks about unearthing outliers through visualisation methods, but a more resilient preparation stage would prevent this occurring later down the design process which could justify more discussion, earlier. Then again, in Chapter 1 (page 12) Nathan does refer to the challenges of data checking, conceding that it is his least favourite part of graph making. Good to know I’m not alone... Chapter 3 provides a useful list of visualisation tools, categorised under out of the box, programming, illustration and mapping, with an emphasis on free resources. One key observation I made, not just about this chapter but across the book overall, was Nathan’s noticeable (and seemingly deliberate) move away from demonstrating methods of data preparation, analysis and visualisation using Excel. I entirely understand his motivation for doing this when he says in the comments section on one of his blog posts “Excel can do some good stuff, and there will be some in the book, but I will also put a lot of energy into weaning people off of it. It’s not as hard as you might think”. Excel is the ubiquitous data handling and graphing tool: in survey results shown on page 88, 31% of respondents said they used Excel for visualisation. Perhaps its general exclusion relates to a belief that enough people already know how to handle themselves in Excel, what’s the point in explaining methods they are already familiar with?   Chapter 4 (Visualizing Patterns over Time), Chapter 5 (Visualizing Proportions), Chapter 6 (Visualizing Relationships), Chapter 7 (Spotting Differences) & Chapter 8 (Visualizing Spatial Relationships) The fundamental meat of the book, and the content that will have most tangible impact on readers, exists between chapters 4 and 8. Here we have a range of practical ‘how to’ guides exploring different visualisation solutions to respond to different data problems or inquiries. Rather than being seen as a menu of options, it is a much more coached presentation of what methods and design choices you may wish to make for given situations. As Nathan explained when previewing the book, “you don't find a tool, and then go look for data that you can plug in. It's the other way around. You get your data, decide what you initially want to know about it, and then pick the tool that's right for the job.” As I've already mentioned, those of you familiar with Nathan’s FlowingData tutorials will recognise one or two of the examples covered in these chapters and will be keen to get your hands dirty with a range of other valuable examples. This section is not so heavy on theory, though it touches on key principles where necessary, it’s more about how to accomplish the visualisation solution you require – what to use, for what situation, and how to do it. It’s presented in a logical, clearly explained and sequential style that would seem appropriate coming from somebody of Nathan’s statistical and engineering background and his practical experiences within the pressured environment of the New York Times Graphics Department. The demonstrations included focus largely on programming solutions – something he clearly set out to achieve from the outset. Prior to these chapters Nathan introduces a discussion about the fear of programming and the need to take it slowly and build up experience and success bit by bit. The majority of the tutorials focus on R, which is clearly his strong suit, but there is also good coverage of HTML, CSS, JavaScript, Python, Flash and ActionScript solutions. These provide a wonderfully accessible and helpful introduction to each method, framed around the spectrum of data visualisation challenges we find ourselves facing. Once you’ve conquered each one, feel comfortable with the coding syntax and structures, and ready to move on, you can then explore other texts to dive deeper into each environment. What this book provides is a starting point from which to commence your own learning process. Nathan is particularly adept at taking initial output of a programming language (say R) and demonstrating how to refine it further, visually, by applying design techniques using Adobe Illustrator. Above all, he makes it clear that the reader/user’s viewpoint is the critical perspective against which all design and visualisation decisions should be made. Finally, I was struck by the observation about how a failure to identify and understand relationships between content is the key factor that causes most graphics to fail themselves. This demonstrates a strong design trait.   Chapter 9 (Designing with a Purpose) Irrespective of the fact this book could be read out of a linear sequence, I felt the nature of the content in this chapter was more closely aligned with the discussions that took place in the Introduction/Chapter 1. One main take away from this short chapter, however, is a very well handled passage about the differences, issues and fault lines that exist within the field about defining visualisation, comparing it with information graphics etc. This debate is for a different book or indeed platform, but Nathan offers a balanced and non-dogmatic assessment of these different viewpoints and simply puts forward his approach which is to “consider the audience, the data in front of me, and ask myself if the final graphic makes sense” (page 341).   Conclusion Every book ever written has something missing, you simply cannot write a book that serves everyone’s needs. But you can shape people’s expectations with a clearly defined purpose and this book perfectly delivers what it intends to. Visualize This is a significant and valuable addition to the library of visualisation titles. You should own it. I’m glad I do.

The ways you can follow and interact with Visualising Data
Andy Kirk about to appear in some new places...