There are plenty of ways to make a good backup of your analytics content and the options available on Tableau Server are numerous. But, and here’s the better question: are they efficient and redundant enough?
Yes, current Server versions allow for n number of days of backup for content but this slowly increases the size of your backup and storage (and puts too many eggs in one basket). Plus, there’s no effective way to turn this option off if you have multiple sites (at least that I’m aware of). What’s more, you can get away with a great daily backup strategy and subsequent (automatic) restore to your development machine.
To add to the complexity, what if you don’t want to use the CLIs (tabcmd) to download massive data sources (>1GB)? What about users who, through no fault of their own, just click ‘download’ from the GUI? Do you, as Server admins, know the impact this has on the Server? Hint: you should and it’s bad.
Have users drop the name of their desired content in a shared file (or dedicated Slack channel) and then have daily backups done without using tabcmd or selecting ‘download’ from the GUI. Bonus: ship to AWS S3 and recover that space on your machine! Bigger bonus: logging.
Here’s what you’re going to do at a very high level:
- Write super SQL that can dynamically get all the info you need (twb/twbx/tds/tdsx)
- Use psql and the lo_export function to get this ^
- This ^ won’t get you the TDE (if there is one) so you need to find it on the filesystem
- Use the ‘extract’ table and get the UUID for where this ^ is stored in the filesystem
- Parse the XML to update the location of the TDE (Soapbox: for those of you who think it’s ‘hacking’ XML, please make sure you RTFM).
- Zip it up and send to AWS S3 and get it off your Server machine
Is that a lot of steps? Maybe. But this whole process is automated. Do a little work up front and you save yourself a lot of time down the line, not to mention a lot of space on your machine. Plus, your Server infra keeps humming along without the added load of multiple versions of your content. You also don’t need to worry about (1) versioning and (2) installing tabcmd.
Here’s a sample of SQL you should write to scale to whatever content you’d need to backup and version.
select ds.id as "id" , ds.luid as "luid" , ds.site_id as "site_id" , s.name as "site_name" , s.luid as "site_luid" , case when ds.data_engine_extracts = TRUE THEN lower(ds.repository_url)||'.tdsx' ELSE lower(ds.repository_url)||'.tds' end as "export_name" , ds.data_engine_extracts as "hasExtract" /*, ds.repository_data_id , ds.repository_extract_data_id*/ , ed.descriptor as "tde_path" , rd.content as "OID" from datasources ds left join sites s on s.id = ds.site_id left join extracts ed on ed.datasource_id = ds.id left join repository_data rd on (ds.repository_data_id = rd.id OR ds.repository_extract_data_id = rd.id)