Tableau Server Performance fun with Slack and Logentries

The Beginning

It started as a joke. Then it became a pretty good idea.

You see, often times Tableau Server admins get a lot of flak. We *constantly* get comments stating:

  • Tableau is slow
  • This workbook is slow
  • Tableau can’t do this because someone I know said as much
  • I like <insert silly dashboard tool here> more because it does <something I can’t remember right now but I’m just trying to sound smart>
  • You’re just an analyst so you don’t know how to optimize data
  • and more!

Let’s be honest, the above comments can, 99% of the time, be tied to someone who designed something incorrectly but wasn’t aware of the implications of said design. Until now 🙂

And in a natural way that a Slack conversation can allow, the comment dropped: ‘It’s like a dumpster fire’

Inspiration!

It goes like this:

  • Slow TWBs trigger alerts all the time on our platform (Server admins should know these bottlenecks already)
  • We pull log data (yes, you can also pull from Postgres but logs are so rich) for those queries via Logentries
  • We parse the log data and convert to the unruly string data into something usable (thank you PowerShell and, specifically, ConvertFrom-String)
  • At an interval of our choosing, we drop the results in Slack (only our team) with a mixture of funny GIFs (because levity is a good thing)
  • We analyze and reach out to the workbook owners for learning opportunities & improvement

Details

ts-slow-content-2
Monitoring the 90th percentile of workbook load time

 

This is the trigger and the cause of much confusion around what Tableau can actually do. You see, if the performance concerns aren’t addressed, every Server admin is going to get the ‘Tableau is slow’ argument. At that point, you’re left defending the platform you set up. But, the question and concerns should all be about what is *causing* Tableau to appear slow.

We address the performance concern with a solid Log Analytics strategy. The above image is one of many examples of alerts we’ll get. This time, we’re going to leverage the Logenties CLI to automatically pull this info out. Yes, automatically.

Here’s what we’ll use:

lecli query -n vizql -q slowwb -f $start -t $end | Out-File -FilePath $workpath\slowwb.txt -Force

The start and end variables are timestamps; we usually do a rolling 24 hours.

ts-slow-content-0
The output of the Logentries query. Not optimal for parsing/sorting, etc.

If you haven’t explored PowerShell’s ConvertFrom-String cmdlet, you’re missing out. It’s pretty remarkable what it can do with a template file and some string data. And it’s based off of some solid (and profound) research.

ts-slow-content-1
Example template file for ConvertFrom-String. We’re implicitly defining some structure and letting the cmdlet work its magic

After you have (1) pulled the log data and (2) set up your template file, run this:

ConvertFrom-String -InputObject $((gc $workpath\slowwbclean.txt)|Out-String) -TemplateFile $workpath\template_slowwb.txt | select Name,Value

Once you do that, you get a beautiful PowerShell object for which the possibilities are endless (well, as much as you want them to be).

So that string data from above is now easily manageable and prepared for a Slack channel drop.

ts-slow-content-5
A real PowerShell object

Enter Levity

Here’s what the daily GIF might look like in our Slack channel. No, no one sees this but us and the ‘movement’ of it really forces us to pay attention. We’re (as Server admins) responsible for maintaining and teaching others how to best use the platform. If we ignore it, then the tool (Tableau) gets misused and decision makers may start to see if the grass is greener.

Again, and I can’t stress this enough, levity is a good thing. I can’t tell you how many times I’ve seen C-level people view workbooks that take 3+ minutes to load. Three minutes of awkward silence. That. Scares. Me.

So we monitor and when it gets bad and we’re being questioned about the slowness, we need to laugh. That’s healthy and I can’t be convinced otherwise. If you’re intentions are wrong, you’re not being human.

This:

slowWB
It moves.

Becomes this:

 

ts-slow-content-4
Daily performance metrics

That’s it! Pretty easy, right? So what else can we do with this?

Well, here are a few ideas, based off of the magic of GIFs.

And remember, be responsible and kind. There are too many rude people on this planet. One more isn’t a good thing.

(1) Slow extracts become:

wrap_it_up

(2) Slow login times on Apache become:

tumblr_lil0h0CeAx1qavy44o1_500

(3) Large Server downloads become:

file.gif

Tableau Server Log Analytics: Easy Parser

I’ve mentioned before but it’s worth mentioning again: Log Analytics and Tableau Server is a wonderful thing. There’s a ton of helpful information in Tableau logs (and pretty much *all* logs) which, along with the PostgreSQL data, make for a very good data toolbox.

I’ve also mentioned Logentries a lot when digging through Tableau logs. There are many reasons I use the tool, but the one which makes it the most useful is: centralized log analysis. Essentially the workflow goes like this: Tableau –> Logs –> Logentries –> Tableau (and around and around). It’s a positive feedback loop of valuable data which can give you insight into things such as:

  • What workbooks take the longest to load and *where* they are geographically
  • What user downloads the most data, how much and how long does it take
  • Http 404s
  • Filters used by a user/workbook
  • Data sources used by a user/workbook
  • and more!

With Tableau, you’re either leveraging a Log Analytics strategy or you’re not. I cannot stress how vital it is for Tableau Server administrators to at least have some plan in place for when you are suddenly inundated with a ‘slow’ server and/or site.

That said, often times it’s easier to have a few functions and tools to make the ad-hoc or automated analysis easier. Here’s one: we’ll wrap the Logentries REST API in a PowerShell function. This will simply allow us to pull log data from Apache or VizQL based off of a simple parameter.

What’s returned is a neatly formatted csv which you can then import into Tableau, add to a database or simply do some quick research. For example, if you want to ensure excessive 404s are handled, you can simply use this function with a filter, parse, and lookup the offending IP address.  If necessary you’d add those IPs to a firewall rule.

More specifically, here’s an example of how you would use the function in PowerShell:

Get-TsLog -leAcctKeyVizQL 'your VizQL key' -leFilterVizQL 'k=end-update-sheet' -workpathVizQL "C:\users\$env:username\Documents" -apikey 'your Logentries API key'
vapor_rub_0
Here’s where your log data (parsed) can become a great means to improve performance and react before things happen

The added benefit of adding this type of aggregated data to your own Tableau data model and database is that it gives the admin some data for historical purposes, planning and learning.

So, here’s the module on the PowerShell gallery. Let me know if there are questions.

 

Slack your Tableau Extract – Part II

Ever wish you could drop the name of your Tableau workbook and/or data source into a Slack channel and have it automatically refresh? What if you’re developing and need to get a quick refresh completed? What if you don’t have tabcmd installed on your machine? What if you want to add step at the end of your pipeline that drops the name of the content into the Slack channel?

ts-refresh-extract-0
Example of name of Tableau content

I’ve talked about this before but what had to happen was the extract needed to exist in the ‘background_jobs’ table. Well, that won’t always happen as people will be doing this for the first time. So, we needed to expand it a bit to include *all* possibilities (workbook and data sources). Also, in this much improved version, we Slack back to the user and let them know their extract is completed.

ts-refresh-extract-2
Process for *each* extract (all dynamic) 

That’s the beauty of the ‘Tableau-Slack-Logentries‘ integration. When you have a decent amount of parts, the whole becomes a fascinating thing.

Here are the steps:

  • get the data from the Logentries webhook
  • Process the data for each extract
  • getting current slack users: don’t need to do this often (unless you want to)
  • getting valid list of workbooks and data sources
  • Processing list of extracts : rolling 24 hours
  • getting valid list of workbook / data source owners
  • create Slack content object: basically it must add up to a certain number to run (for example, if the person who dropped the name in the channel isn’t the owner, it won’t succeed).
  • Log it!

 

ts-refresh-extract-3
Tableau Server example of completed extract. 
ts-refresh-extract-1
Corresponding message back from Tableau Server

If anyone is interested in the code and / or a demo, please let me know and I’ll be happy to show it.

Automatically remove (and archive) extracts that fail more than ‘n’ times

Keep it clean

Every now and then, Tableau extracts aren’t refreshed properly. That, in and of itself, isn’t a bad thing. What we’re worried about are the extracts that continue to fail day after day. They consume resources and, most importantly, don’t give a true picture of the data.

ts-extract-fails-0
Continual extract failures

Here’s a simple script that queries postgres and grabs the extracts which have failed n times (you can choose your threshold). At a very high level, it does the following:

  • Get list of failed extracts (most recent fail date should be the current date)
  • Limit to those only above your desired threshold
  • Use REST API to pull extracts from Tableau Server
  • Do a diff on list and downloaded files and remove only those which are equal
  • Archive twb/twbx/tds/tdsx so users can refer to this later
  • Delete content from Tableau Server
ts-extract-fails-1
Here is how we dynamically switch from site to site *and* from content resource (data source or workbook) with the REST API

Taking it a step further

If you have a Log Analytics strategy in place, you can send the output of the script to a file and have the log agent follow it. This will give you and the rest of the Tableau admin team some insight to what is failing (other than what I’ve talked about before).

You can also integrate with Slack in order to notify the user/owner that his/her workbook will be remove until the failure is fixed.

Slack your Tableau Extract

Some background

As most of you might already know, I’m a pretty big fan of making things simple and more automated in order to get data to the people. When someone has to muck around in an area that might take him/her away from their analytics ‘flow’, I see opportunity.

Enter the ‘refresh-my-tableau-datasource-via-Slack’ bits.

Why would we do this? Well, perhaps tabcmd isn’t on the Server or client. Or perhaps it’s easier to make an API call and post a message at the end of a pipeline rather than write to a file and do this. But wait, even better, it might be easier for an analyst and the like to just simply type a ‘Refresh: <some data source or twb>’ message into a Slack channel.

ts-slack-extract-
Sample  text coming across the wire

Before you think, ‘wow, this could escalate quickly’, rest assured there are checks and balances. Here’s a few that we’ve implemented:

  • Only the owner of the twb/data source can refresh
  • Can’t refresh more than 1x per max refresh time (you should already be keeping these stats)
  • We log all of these in addition to our regular Tableau logging

Anyway, this is pretty easy to set up because of the early work with the ‘Reset my Tableau password via Slack’ bits.

 

 

The Tools

  • PowerShell
  • Logentries
  • Slack
  • Tableau Server, of course

 

Implementation

Create a Slack channel where all of this chatter will go and then with Logentries, set this up. After you’ve done that, run the code to make all the magic happen.

ts-slack-extract-1
Post in a Slack channel w/ name of TWB or Data Source

NOTE: For the one of the checks and balances mentioned above, you will want to limit the overuse of the extract refresh. For example, if someone gets excited and drops the TWB/Data source name in there 15 times in an hour, this code will prevent that from happening:

SELECT
date_trunc('hour',created_at) - INTERVAL '7 hour' as &amp;amp;quot;StartHour&amp;amp;quot;
,COUNT(*) as &amp;amp;quot;Total&amp;amp;quot;
,AVG(EXTRACT(EPOCH FROM (completed_at-created_at))) as &amp;amp;quot;TotalTime_Avg_Sec&amp;amp;quot;
,AVG(EXTRACT(EPOCH FROM (completed_at-started_at))) as &amp;amp;quot;RunTime_Avg_Sec&amp;amp;quot;
FROM background_jobs
WHERE job_name LIKE 'Refresh Extracts' and title = '&amp;amp;lt;your extract variable&amp;amp;gt;'
GROUP BY
date_trunc('hour',created_at) - INTERVAL '7 hour'

After you’ve successfully triggered the extract, you can certainly send a message back to the user and/or notify them in the event of a potential extract refresh failure. Did I mention we have code for that?

For those interested in the nuts and bolts of the implementation, please don’t hesitate to let me know.

Phoenix TUG – Workbook Performance

Here is a copy of the presentation deck from the Phoenix Tableau User Group – April 2016 meeting. It was a pleasure to be able to speak to such an enthusiastic crowd of Tableau Users.

Other bits: 

  • You can find the code samples here.
  • Those who weren’t able to receive a free trial card, please let me know and I’ll get you one asap.
  • It’s all about the Tableau M.M.A 🙂

Tableau Garbage Collector and Metadata Discovery

In the beginning, there were TWBs

Ok, so it’s not really ‘garbage’ because people put effort into designing and creating both their workbooks and Tableau Server data sources. What I’m talking about here is an outcome of a Self Service analytics environment: stuff just happens to land in projects/sites and it grows and grows.  While the work may have been used at one point, the workbooks/data sources consume valuable resources.

ts-g-collection-1
Our sample code snippet

 

In order to keep the environment swift and snappy, we’ll want to make sure we remove and archive the old materials. The beauty of this is that one query will accomplish two major avenues of management (Tableau Garbage Collection and Tableau Metadata Discovery).  In doing so, we’ll follow these steps:

  • Export old stuff from tableau (twb & tds)
  • Store on Amazon S3
  • Remove locally
  • Update channel/users via Slack

Tools needed:

  • Tableau Server admin (for Postgres access and REST API)
  • Slack (not necessary, but helpful for alerts/notification/collaboration)
  • Amazon AWS account (not necessary, but also helpful for storage)
  • PowerShell > v3

One Query to Rule Them All

This query will allow the admin to control/remove everything they need from Tableau Server. Here are the 11 fields:

  • Id: Id for the twb(x) or tds(x)
  • Last_view_time: max time the object was viewed
  • Luid: luid for twb(x) or tds(x)
  • Site_id: site id
  • Site_name: site name
  • Site_name_proper: site name for REST API
  • Site_luid: site luid
  • Export_name: name for export
  • HasExtract: whether or not there is a TDE
  • ExportType: datasources/workbooks endpoint for REST API
  • OID: ID for object in Postgres (lo_export)

Yes, that’s it. Now we’ll write a daily/weekly/monthly collection script that will clean up everything from Tableau Server.

ts-g-collection-0
Main fields for GC

Oh wait, it’s gone, now I suddenly need it

Ok, so perhaps we were a bit overzealous and removed too much. Have no fear, because this process also archives everything on Amazon S3. If it’s not on Tableau Server, it’s on Amazon and easy to recover and restore. Once we take a diff between what we have locally and what we have on S3, we’ll remove everything locally. Now, older, er, retro workbooks will forever be saved.

Send a Summary to the Admins: It’s full of TWBs

Since this is automated, we do keep a history of what was removed and, after this script runs, we send a summary to our Slack admin channel. This is the last step in the processing and gives info such as: total disk space recovered, total TWBs, etc

ts-g-collection-3
Sample data sent to Slack

The other side of the coin

You may have noticed the ‘OID’ field in the query. This is the easiest way to export metadata info from Tableau (ever try to download a 750MB workbook?). And combining this with what I’ve discussed here and here will make your life so much more fabulous. An your example workflow would be:

  • Export TWB / TWBX / TDS / TDSX
  • Parse metadata info (columns, rows, filters, and data sources).
  • Use Tableau class to export custom sql code

ts-g-collection-2

Tableau Server: Alert Gracefully

Some background

A while back, I wrote about how we make use of the background_jobs table in Tableau Server. You can see the code here. Although it did a great job with making our people aware of what failed (and when), it still arrived too frequently and was also sent via email. If you see something in email, it’s typically more resource intensive to take in all the bits of the message (who, when, where, what, etc.). Then you have to switch back to figure out where (in all your potential Tableau sites) the workbook/data source could be located. With the beauty of Slack and its well-documented API, we can now send the alerts via Slack messages.

Tools needed

  • Slack
  • Tableau (PostgreSQL access)
  • PowerShell

Why Alert Less?

Put simply, it avoids ‘alert fatigue’ and also gives just enough information for action; enough to glance at it and immediate know where to go and how to fix it. Rather than all hours throughout the day, this method just issues a direct message via Slack to each user. Further, it doesn’t embed an image or csv into an email (which would actually require more work). Bonus, we also remove the ‘com.tableausoftware.nativeapi.dll.DataSourceException:‘ portion of the error message. You would be surprised at how much people ignore when they are unsure about the message.

It’s a private message with four (and only 4) bits of information:

  • workbook or data source name
  • site
  • time of failure
  • friendly error message (we actually keep the most common and reply with those messages instead)
ts-alerts-0
sample Slack message to a user

The Core Query

This is it, seriously.

ts-alerts-1
The text enclosed in yellow is configurable to your environment

Getting the Slack Username/ID

This was the challenging part. The goal was to add an additional column to the query output above so when we did our notification loop in the code, we could dynamically send to the owner. This little bit made it pretty easy to just add their Slack ID to the Tableau query:

$t = curl -F token=”$($Slack)”  https://slack.com/api/users.list -k –silent

($t | ConvertFrom-Json).members | ForEach-Object -process { $sUser = $_.id ; $_} | select  @{n=’slack_username’;e={$sUser}},@{n=’profile_email’;e={$_.profile.email}}

Here’s the process for each day:

  • Split the day up into, wait for it, AM and PM.
  • Run the AM/PM code at some desired time
  • Notify each user via Slack, individually.

You may be wondering: what happens if you miss an extract during some gap? For example, what if someone’s extract failed at 11pm and we didn’t run the script until the next day? Pretty simple: if it’s run at 11pm and it is mission critical, it would be fixed at 11ish pm. We’re talking about daily (read: now) alerts. If I’m alerted today about something that happened yesterday, I’m less likely to do something about it. But there’s all sorts of psychology about that 🙂