We’ve all been there: parsing the workbook, changing extracts or copying formulas only to be stuck with the dreaded ‘[Calculation_3400904132723583]’ (or it’s many numeric derivations).
Well, stress no more because of the Tableau M.M.A (Monitor/Maintain/Alert), we’re able to switch that out and replace with the actual name of the calculation. Now, when you re-open the workbook, these won’t error out.
Here’s what we do, remember a while back, I talked about file alerting and triggers? No, well, go read it here, I’ll wait.
Essentially, once someone drops a file (any file: twb, twbx, tds, or tdsx) in the desired directory, the code will immediately parse and spit out the following:
If you’re like us and have mission critical workbooks, then backing up and auditing these is always a good idea. Storing them into a database is also easy and, what’s more, this ties into the performance metrics I’ve mentioned here and here.
A few weeks back, I did a webinar with Tableau and the Server Admin User Group and I chatted about our TWB Internals database. In that talk, I mentioned a couple of tables:
Well, I’m happy to announce a third and final piece to the data model: Geo data. This is often a missing piece to the general ‘performance’ discussion on Tableau Server. What’s more, if your organization serves users both internally and externally, then performance becomes a big first impression. For example, you want to know why some users in various locations might mention a view is slow to load even though it’s been tested at, say, less than two seconds. Adding the geography (and ISP) of the people requesting the view makes the data (and workbooks) a bit easier to diagnose. BONUS, you can also understand the most requested views, http status codes (500s, etc.) and more.
Geo data: steps to implement
Pull the Apache logs (I used Logentries and its open API to pre-filter before exporting).
Parse the logs for basic columns: ip, datetime, request, port, etc
Get distinct list of IPs and lookup
Add to database (choose your flavor)
Connect the dots
The final table in the model had a few steps to it. It’s an IP helper table that stores the information for users who request views (city, state, latitude, longitude, etc.). Rather than look up each IP, I’ll check this table before and omit it from the lookup if we already have it stored. Once the basic script is complete, we automated it and then join the previous two tables together in order to understand performance of a view as it relates to geography. So, you’re SQL to join them all together might look something like this:
,sum(w.elapsed_sec) as load_time
FROM tableau_workbook_perf as w
JOIN tableau_apache_perf as a on w.req = a.requestid
JOIN tableau_apache_geo as g on g.ip = a.ip
Now that we have all the pieces together, it’s time to visualize it back in Tableau (ah, yes, the circle of life, or data). We can now take the highly viewed workbooks and understand if performance is a local/geographical issue and/or a time issue. When we combine with the TWB Internals metadata, well, we have a very complete picture of a workbooks value.
Ok, so it’s not really ‘garbage’ because people put effort into designing and creating both their workbooks and Tableau Server data sources. What I’m talking about here is an outcome of a Self Service analytics environment: stuff just happens to land in projects/sites and it grows and grows. While the work may have been used at one point, the workbooks/data sources consume valuable resources.
In order to keep the environment swift and snappy, we’ll want to make sure we remove and archive the old materials. The beauty of this is that one query will accomplish two major avenues of management (Tableau Garbage Collection and Tableau Metadata Discovery). In doing so, we’ll follow these steps:
Export old stuff from tableau (twb & tds)
Store on Amazon S3
Update channel/users via Slack
Tableau Server admin (for Postgres access and REST API)
Slack (not necessary, but helpful for alerts/notification/collaboration)
Amazon AWS account (not necessary, but also helpful for storage)
PowerShell > v3
One Query to Rule Them All
This query will allow the admin to control/remove everything they need from Tableau Server. Here are the 11 fields:
Id: Id for the twb(x) or tds(x)
Last_view_time: max time the object was viewed
Luid: luid for twb(x) or tds(x)
Site_id: site id
Site_name: site name
Site_name_proper: site name for REST API
Site_luid: site luid
Export_name: name for export
HasExtract: whether or not there is a TDE
ExportType: datasources/workbooks endpoint for REST API
OID: ID for object in Postgres (lo_export)
Yes, that’s it. Now we’ll write a daily/weekly/monthly collection script that will clean up everything from Tableau Server.
Oh wait, it’s gone, now I suddenly need it
Ok, so perhaps we were a bit overzealous and removed too much. Have no fear, because this process also archives everything on Amazon S3. If it’s not on Tableau Server, it’s on Amazon and easy to recover and restore. Once we take a diff between what we have locally and what we have on S3, we’ll remove everything locally. Now, older, er, retro workbooks will forever be saved.
Send a Summary to the Admins: It’s full of TWBs
Since this is automated, we do keep a history of what was removed and, after this script runs, we send a summary to our Slack admin channel. This is the last step in the processing and gives info such as: total disk space recovered, total TWBs, etc
The other side of the coin
You may have noticed the ‘OID’ field in the query. This is the easiest way to export metadata info from Tableau (ever try to download a 750MB workbook?). And combining this with what I’ve discussed here and here will make your life so much more fabulous. An your example workflow would be:
Export TWB / TWBX / TDS / TDSX
Parse metadata info (columns, rows, filters, and data sources).
A while back, I wrote about how we make use of the background_jobs table in Tableau Server. You can see the code here. Although it did a great job with making our people aware of what failed (and when), it still arrived too frequently and was also sent via email. If you see something in email, it’s typically more resource intensive to take in all the bits of the message (who, when, where, what, etc.). Then you have to switch back to figure out where (in all your potential Tableau sites) the workbook/data source could be located. With the beauty of Slack and its well-documented API, we can now send the alerts via Slack messages.
Tableau (PostgreSQL access)
Why Alert Less?
Put simply, it avoids ‘alert fatigue’ and also gives just enough information for action; enough to glance at it and immediate know where to go and how to fix it. Rather than all hours throughout the day, this method just issues a direct message via Slack to each user. Further, it doesn’t embed an image or csv into an email (which would actually require more work). Bonus, we also remove the ‘com.tableausoftware.nativeapi.dll.DataSourceException:‘ portion of the error message. You would be surprised at how much people ignore when they are unsure about the message.
It’s a private message with four (and only 4) bits of information:
workbook or data source name
time of failure
friendly error message (we actually keep the most common and reply with those messages instead)
This was the challenging part. The goal was to add an additional column to the query output above so when we did our notification loop in the code, we could dynamically send to the owner. This little bit made it pretty easy to just add their Slack ID to the Tableau query:
You may be wondering: what happens if you miss an extract during some gap? For example, what if someone’s extract failed at 11pm and we didn’t run the script until the next day? Pretty simple: if it’s run at 11pm and it is mission critical, it would be fixed at 11ish pm. We’re talking about daily (read: now) alerts. If I’m alerted today about something that happened yesterday, I’m less likely to do something about it. But there’s all sorts of psychology about that 🙂
While I had originally joked about the ‘Vapor Rub’ title, it has really, er, stuck to me at this point. It sort of describes the wonder of getting the right data into Tableau: once it’s there and properly formatted, everything just feels better.
Part I: Getting the data from all the places
The goal, then, is to get all the data we need for analysis into some sort of table (or tables). Ideally, it would be one table but when that’s not actually possible, we’ll create our own data model. If you’re organization has data in a multitude of places, it becomes challenging for Tableau developers to begin to acquire it and shape it into useful analysis without sacrificing some performance. Adding additional data sources can, depending on your approach, slow your workbook. Further, when we’re talking about massive data, well, performance will suffer it one doesn’t approach it efficiently.
Part II: Processing the data by distributing the load (PowerShell Workflows) across computers
Here’s an example: pivoting data, a lot of data. We needed to pivot pipe delimited fields into their own columns, add to a database and then create a Tableau Server data source. A few caveats:
Needs to happen in sequence (part A needs to happen before part B)
It could be a long running task so if a part fails, I want the code to pick up where I left off
I need to distribute to load across computers because this is a lot of data
Now, grabbing the data via code isn’t that difficult of a task. We iron out our approach, sketch out some table or tables we might need to create and, hopefully, the data speaks. To do all this, we’ll use PowerShell Workflows.
Workflows are built around activities. These allow us to implement logic that wouldn’t necessarily fit within the function paradigm (though, they are related to some extent). Activities exist on their own, so ‘bucketing’ each piece of your code will assist a lot.
The difficult part is making sure we add the right checkpoint and split the load accordingly; we have a ton of files that need data pivoted and doing that work on one machine isn’t ideal. What’s more, if we do all the hard work of parsing and it doesn’t work or the computer crashes, we would need to start over. Having a checkpoint (via Checkpoint-Workflow) makes life a lot easier. At a very high level, our process will look like this:
Find files/data (S)
Do the parsing/splitting, etc (S)
Add to database (P)
Create Tableau data source (P)
Alert users (via Slack) (P/S)
NOTE: S = Sequence; P = Parallel
A sample of what we’d need to parse
Part III: Adding consolidated data to a database
After doing the work of parsing and pivoting our data, we now need to add it to the database. This is also pretty straightforward if you’ve done it once. And the right connection string can make all the difference. As mentioned above, this will be done in parallel.
Part IV: Consume in Tableau
Having done the work mentioned above, making a Tableau Server data source is the easy part. We can dynamically create the extract or simply leverage or typical workflow (connect to data and publish to Server).
In doing this, you might be wondering, what other ways can we leverage PowerShell Workflows along with Tableau Server. Here’s a few that work very well:
Tableau Extract Mover/Creator
Backups / Restores to various environments
Csv to TDE creator on a large scale
Creating custom reports/data files for Internal & External Users