While I had originally joked about the ‘Vapor Rub’ title, it has really, er, stuck to me at this point. It sort of describes the wonder of getting the right data into Tableau: once it’s there and properly formatted, everything just feels better.
Part I: Getting the data from all the places
The goal, then, is to get all the data we need for analysis into some sort of table (or tables). Ideally, it would be one table but when that’s not actually possible, we’ll create our own data model. If you’re organization has data in a multitude of places, it becomes challenging for Tableau developers to begin to acquire it and shape it into useful analysis without sacrificing some performance. Adding additional data sources can, depending on your approach, slow your workbook. Further, when we’re talking about massive data, well, performance will suffer it one doesn’t approach it efficiently.
Part II: Processing the data by distributing the load (PowerShell Workflows) across computers
Here’s an example: pivoting data, a lot of data. We needed to pivot pipe delimited fields into their own columns, add to a database and then create a Tableau Server data source. A few caveats:
- Needs to happen in sequence (part A needs to happen before part B)
- It could be a long running task so if a part fails, I want the code to pick up where I left off
- I need to distribute to load across computers because this is a lot of data
Now, grabbing the data via code isn’t that difficult of a task. We iron out our approach, sketch out some table or tables we might need to create and, hopefully, the data speaks. To do all this, we’ll use PowerShell Workflows.
Workflows are built around activities. These allow us to implement logic that wouldn’t necessarily fit within the function paradigm (though, they are related to some extent). Activities exist on their own, so ‘bucketing’ each piece of your code will assist a lot.
The difficult part is making sure we add the right checkpoint and split the load accordingly; we have a ton of files that need data pivoted and doing that work on one machine isn’t ideal. What’s more, if we do all the hard work of parsing and it doesn’t work or the computer crashes, we would need to start over. Having a checkpoint (via Checkpoint-Workflow) makes life a lot easier. At a very high level, our process will look like this:
- Find files/data (S)
- Do the parsing/splitting, etc (S)
- Add to database (P)
- Create Tableau data source (P)
- Alert users (via Slack) (P/S)
NOTE: S = Sequence; P = Parallel
Part III: Adding consolidated data to a database
After doing the work of parsing and pivoting our data, we now need to add it to the database. This is also pretty straightforward if you’ve done it once. And the right connection string can make all the difference. As mentioned above, this will be done in parallel.
Part IV: Consume in Tableau
Having done the work mentioned above, making a Tableau Server data source is the easy part. We can dynamically create the extract or simply leverage or typical workflow (connect to data and publish to Server).
In doing this, you might be wondering, what other ways can we leverage PowerShell Workflows along with Tableau Server. Here’s a few that work very well:
- Tableau Extract Mover/Creator
- Backups / Restores to various environments
- Csv to TDE creator on a large scale
- Creating custom reports/data files for Internal & External Users
- A Vizable file delivery platform