Tableau European Conference – Freakalytics update Day 1

This week Tableau are holding their inaugral European customer conference in Amsterdam. With a wide range of hands-on training, top quality keynote speakers, one-on-one expertise opportunities and in-depth break-outs its sure to be an excellent event. The much anticipated release of Tableau 6.1 will also be showcased.

I’m delighted to say that Stephen and Eileen McDaniel of Freakalytics – an impressively knowledgeable Tableau Education Partner – who are both present and presenting at the conference, are providing updates from the event and have kindly allowed me to share their observations.

Published below are some live blogs relating to Day 1′s key sessions or events. These are a direct lift from the Freakalytics ‘Thoughts’ page which hosts these rapidly published blog updates straight from the conference floor.

Please note, some comments are the opinion of Freakalytics and not necessarily those of Tableau. This content was live blogged; there may be occasional errors or omissions.

Day 1 Schedule (note many are concurrent sessions)


 

11-00 to 12:00 | Scaling and Performance Best Practices

With Dan Jewett, VP of Product Management, Tableau (Dan and Stephen previously worked together at Brio in the late 90′s).

Overview

  • Architecture
  • Hardware
  • Server config
  • Database or data engine?
  • Workbook optimizations for Server publishing

1) Prefers that most customers see it as a black-box

Content from Desktop
To Tableau Server
People then access your content via no-client AJAX technology (like GMail)
HTTP(S) access via Apache web server, please leave web server alone if at all possible!

VIZQL and web app meet your requests, they work with a security layer to use

  • Tableau caching
  • Data engine
  • Databases and Tableau Data Extracts

Holding this all together is a repository with user management, content management and a search engine (Lucine project ?name? by Apache)

2) Hardware- don’t be thrifty here, this makes a big difference in the user experience!

For a modest investment in hardware, you can radically increase performance and capacity
Dell server example- $14k US buys 8 cores, 64 GB memory, 3 TB of fast SCSI (RAID 5); but $38k US triples the CPUs, 128 GB RAM, 4 TB of RAID
Assume with 100 users, this tripling of server capacity adds perhaps 10% to project costs but much better capacity at peak load times (e.g. – Monday mornings when everyone logs in)

Windows Server 2003 or 2008, you want 64 bit OS and hardware!

MEMORY- more is better!  I would trade some CPU’s for more memory if you have to trade off.

Fast disks matter, a lot- RAID config is better
Lots of data extracts can mean need for high storage capacity

Can be hosted on a virtual machine, but physical machines typically perform better (based on anecdotal feedback).

Tableau web clients accessing the server are chatty with the Tableau Server system, so there is a moderate amount of networking speed needed between the two.  However, there are many little requests, so network latency can be an issue.  Consider co-locating smaller servers worldwide instead of one big central system. Viz’s render as lots of little tiles with JavaScript for interaction.

3) Distributed components- a scalability strategy not a performance strategy (allows lots of users, but not faster viz’s!)

Primary TS machine in the cluster- it is the Load Balancer for client requests
Add worker machines, even moving data engine off to worker machines
Can even move data engine off the client worker machines, simplest way to increase performance- the data engine is the big memory user!

Caching is per process- distribution can actually diminish performance since the caching is not shared amongst machine or processes.

Keep your machines in the same subnet if possible
Firewalls and DMZ’s can slow network communication, sometimes significantly depending on how locked down your systems are…

4) Caching

Request comes in from user at web client.

  1. Fastest- created this view before and in cache, if so, just send cached images- no queries or calcs needed!
  2. If no cache image, then do I have the SQL query in cache?  If so, use cached data to render view and send to user.
  3. If no cached query, hopefully database is fast and can quickly send results.

Three cache control strategies in the Server config dialog

  • Minimize queries- hold as long as there is memory available- users hitting refresh data will force data to refresh
  • Balanced- holds no longer than your specified number of minutes
  • Most up-to-date- no caching of models or data at all, always fresh/hot data from the oven, but at a much higher cost to both the database and the Tableau Server!

Model cache size- how many viz’s to cache (4 views in a dashboard, that is 4 caches) – 30 is default- should be much higher on server!  (100-200???)
Query cache- size in MB of query results to cache- 64 MB is default- should be much higher on server!  (2,000-4,000 MB on a 64 MB Server)

How many server processes per server core?  2 VizQL and App per core is a good start.  If caching is critical, then add more VizQLs. If your server is lower-end, reduce to one per core.

5) Database or extract?

Database- live data needed or want database security based on users.
Data extract- faster (unless you have Teradata, Vertica, Netezza, etc.), prefer data changes in reports on a predictable schedule; can handle security by user filters when published or Tableau Server view restrictions.

Note that there are many ways to optimize extracts including hide unused data items, filter based on dashboard need, data aggregation by visible dimensions and used level of date details.  Stephen has seen extracts reduced by 50-99% with these methods.

6) Better server performance

Real-time virus scanners can kill system performance on the Server, consider nightly virus check instead or restricted scanning

Server timeouts can be optimized- default session release is 4 hours per user request

To prevent runaway queries, Tableau is default config terminates queries lasting longer than 30 seconds.  You might set this lower or higher…  This setting also impacts scheduled extract refreshes.

7) Workbook optimization

If your workbook is slow on the desktop, it will be slow on Server!
Large workbook file size on desktop should be examined for optimization and removal of unneeded elements
Smaller workbooks are better
Custom bins can be slow
20-80 worksheets in a workbook- avoid this if possible
Tabbed views are slower to render than single views
Large crosstab views can be very slow (10-1,000 pages of crosstabs)


 

13:30 to 14:45 | Developers on Stage: The Premier of Tableau 6.1

with Chris Stolte, Dan Jewett and Francois Ajenstat, Tableau Software

Overview

Mission of dev team- “Help people see and understand their data.”
Make data fun, anathema too many, but key to using data throughout the organization and entire decision-making process

Four key areas of investment for 6.1

  1. Data performance
  2. Sharing via mobile optimization with iPad
  3. Localizing and globalizing the products- French and German
  4. User experience- make every step easy, fast and fun!

Data architecture

Live connection to any data source that is already working at your company
- Continue to invest in this approach, very important

Unfortunately many people have data everywhere- Excel, CSV, text, tab-delimited, Access, some data marts, etc.  We want to make it easy for these people to also use their data!

1st example- 500 million words from Google books project, examining use of various words.  Results came back on his desktop in 5-10 seconds with 500 million records over 70 years.  NO DATA WAREHOUSING, just load your data into the extract!

Realized that you might want incremental additions to Tableau Data Extract.  Dynamically loading Tweets on conference.  Chris had data through 11 AM, but it is now 1:30 PM, he told the extract to refresh just data since 11 AM using the date time field.

You might have data from your data warehouse. But it isn’t uploading as frequently as you would like.  Tableau can add data to the extract from –another– source, not just the original data source!  e.g. – Monthly files are standard from database, but I have a critical weekly addition from a comma delimited file.  Can easily add it to the extract.

Many other new data features in 6.1

  • Faster extract creation with text files
  • Faster queries
  • Impersonation for SQL Server
  • Teradata LDAP authentication
  • Initial SQL when connecting to Teradata

Localization and Globalization

German and French now available in their language

Also expanded geocoding,

  • Worldwide cities, more than 15,000
  • Post codes for almost all countries
  • Fields natively supported in your language!  (e.g.- US cities in German language)
  • Even the map labels automatically adapt to your language and locale
  • Workbook locale for dates, numbers, etc.

User experience

Pin and unpin from start page, clean up start page

View Data everywhere- data connections, custom SQL, at top of data pane- a commonly requested feature by accountants, financial people

From View Data you can now pick just part of the data- for example just some people’s names instead of all the data in the columns

Next feature from web site forums- refresh all extracts in workbook, new command on Data menu instead of individually selecting them

Improved pan and zoom on maps and charts

Links for dashboard images

Author control of legend layout

Dark map style with black background

iPad dashboards and more!

  • Rich visual analytics
  • Touch optimized for gestures
  • Consistent layout
  • Author once and work anywhere
  • Focus on many platforms, not just iPad direct with Tableau application
  • BUT should also work with regular web access
  • Automatically can tell on iPad
  • Interaction is awesome and smooth like an iPad app
  • Hard to tap right values in a filter list, it automatically gives you a touch optimized quick filter
  • Couldn’t easily multi-select before, but now I can hold down my finger to multi-select from a viewer
  • Actual app store application, regular web interface for login
  • Fun seeing our dashboard in the Tableau demo!
  • Pages example like the Hans Rosling TED demo
  • Authoring content for mobile platform- very easy and effortless, just like any other content!
  • Interactions subtly different, but in each platform- Windows desktop, Windows browser, iPad browser, iPad Tableau app.
  • Even on Windows browsers, new iPad features like gestures were carried over for pan/zoom/selection for end-user experience.

15:15 to 16:15 | Deep Dive into Time Series Analysis
with Meredith Dicks, Tableau Software

“Watched part of Meredith Dicks excellent talk on Time Series Data in #Tableau. No live blog since it was packed, nowhere to sit!”


 

Thanks again to Stephen for sharing his first day’s observations, see tomorrow’s post for day 2 highlights…