Guest post: Day 2 at the O’Reilly Strata Conference

February 3, 2011
9:10 am
by Andy Kirk

This is a guest/cross-post by Jan Willem Tulp, the winner of my recent contest to win a full pass to the O’Reilly Strata conference. The conference is taking place this week and Jan has kindly offered to share a short summary on each day of the conference. You can find out more about Jan’s work via his blog and follow him on Twitter @JanWillemTulp.

Day 2 at O’Reilly Strata Conference
After a day of tutorials, the second day at Strata was the first of two conference days, packed with fascinating sessions. The day was kicked of with a plenary session with a long list of top-speakers in field of data science: Edd Dumbill of O’Reilly Media, Alistair Croll of Bitcurrent, Hilary Mason of bit.ly, James Powell of Thomson Reuters, Mark Madsen of Third Nature, Werner Vogels of Amazon.com, Zane Adam of Microsoft Corp, Abhishek Mehta of Tresata, Mike Olson of Cloudera, Rod Smith of IBM Emerging Internet Technologies and last but not least Anthony Goldbloom of Kaggle. Various topics were presented in presentations of 10 minutes each, like data without limits, data marketplace, and the mythology of big data. The shortest presentation struck me most: “the $3 Million Heritage Health Prize” presented by Anthony Goldbloom: people are challenged to create a predictive application that uses healthcare data to predict which people are most likely to go to hospital, so that ‘US healthcare becomes healthcare instead of sickcare’. The prize is $3 Million for the one who solves this!
Next up were the individual sessions, and I was very much looking forward to the talk “Telling Greate Data Stories Online” Jock MacKinlay of Tableau. And though the talk itself was excellent, for me it was all known stuff, but the talk is highly recommended for those unfamiliar with Visual Analytics or Tableau. Being biased towards visualization related sessions, my next session was “Desinging for Infinity” by Dustin Kirk of Neustar. Dustin showed 8 Design Patterns of User Interface Design, like infinite scrolling, which were really good. It reminded me of the updated version of the material in Steve Krugg’s book Don’t Make Me Think.
Next up was the best talk of the day: “Small is the New Big: Lessons in Visual Economy”. Kim Rees of Periscopic showed us very good examples of effective information visualizations. I was really blown away by this presentation, mostly because she really showed how creatively removing clutter and distractions can make the visualization very effective. Also the creative interactions that help the user using the visualization were compelling. Next was Philip Kromer of Infochimps on “Big Data, Lean Startup: Data Science on a Shoestring”. Though my expectations were that Philip was going to explain the Lean Startup principles, evangelized by Eric Ries, the talk was more about Infochimps approach to doing business. Some remarkable comments by Philip: “everything we do is for the purpose of programmer joy”, and “Java has many many virtues, but joy is not one of them”. Great presentation and inspiring insights!
My next sessions was “Visualizing Shared, Distributed Data” by Roman Stanek (GoodData), Pete Warden (OpenHeatMap) and Alon Halevy (Google). After short presentations of each, these three guys had a panel discussion where the audience could as questions. Their discussion evolved mostly around the fact that all three deal with data that is created and uploaded by a user, and how do you deal with that: do you clean it, what’s the balance between complex query functionality and ease of use, etc. My final session was “Wolfram Alpha: Answering Questions with the World’s Factual Data” by Joshua Martell. Half the talk was a demonstration of the features of WolframAlpha, and the other half was more or less a high level talk about how WolframAlpha handles user input, how data is stored, how user analytics is performed, and more.
The day ended with a Science Fair where students, researchers and companies were showing new advancements in the field of data science. There were really interesting showcases, like a simulation tool for system dynamics. But again biased towards visualization, the one that struck me most was Impure by Bestiaro. Impure is a visual programming language that allows users to easily create their own visualization, both simple and very advanced. It was also great to see the passion of Bestiario for their own product.
Finally one of the best things of the conference so far has been meeting people, some of which I only know virtually for some time now. I especially enjoyed meeting all the visualization people today. It’s really great to meet many of the online visualization community in person.
So again, a fantastic day at Strata, and I am looking forward to tomorrow!
===========================
Jerome Cukier, the second of my Strata Conference contest winners, submitted the following summary in the comments for this post and I decided they warranted adding to this overall Day 2 summary. You can also follow Jerome’s conference updates via his Twitter account. Thanks Jerome!

I was really inspired by Hillary Mason’s opening keynote. Watch this for yourself here, in 10 mins Hillary manages to explain both what’s going on in the field and get us excited about what will come.
I have been extremely thrilled by Kim Rees talk. Get this from someone who’s spent the last 3 years watching every datavis that went viral on the internet, I had not seen 90% of the examples she showed. She should post the slides at now.periscopic.com.
I really liked Dustin Kirk talk as well, which was extremely practical. The issue he tackles is: now that applications (esp. web applications) have to let users handle a huge amount of data, how is that affecting interface design? he showed us the contrast between the “1995 way” of say, selecting one item in a list of 500 – which would be the standard HTML drop down list (), in contrast with the state of the art, such as selecting labels in gmail or finding contacts in an iphone. Selecting items in a big list is just one of the many problems where endless data calls for an immediate change, and Dustin did a great job of illustrating that with examples. Slides can be found at www.dustinkirk.com/infinity.
There were 2 exciting sessions I attended without Jan Willem. The first was a talk from Peter Skomoroch from linkedin who talked about exploring the “data exhaust”, or the byproduct of our digital activities. after explaining general principles he demoed that with a project his crack team of data scientists/visualizers came up with in a couple of days: a mashup between the strata attendee directory listing and their linkedin profiles, complete with skills and connections. The result is a thought-provoking network map of the skills of the Strata people.
Lastly, I saw Matthew Russell make an impressive demonstration of what he explains in his book, “mining the socialweb”, specifically how to use python to get gems of information from twitter. I’m having a hard time deciding whether the code was more interesting than the actual questions that Matthew was asking to a popular twitter account, what I do know is that I’m getting the book on my way home and so should you.
Follow these speakers on twitter – @hmason, @krees, @dustin_kirk, @peteskomoroch, @ptwobrussell

http://twitter.com/#!/HMASON