Forget about perfection
Your data (information) is a set of free-flowing, dynamic instructions about how business (or whatever) is understood. It will never be perfect. In fact, you don’t want it to be; if it’s perfect (which, again, is impossible) there’s no room for improvement or self-reflection. What’s more, it will lack creative impulse: you don’t think freely if something is ‘perfect’ and done.
Data is fluid
The goal should be to bring the certification/governance process *to* the data. If you must wait, collect, meet, agree and on and on, there is a critical piece missing: data should be certified from what it produces (or its many derivations). How does this work? How can you certify a csv file? Simple: Alert, Integrate and Monitor your Analytics Infrastructure.
Essentially, if nothing is created from the data, why would there be a need to certify it? Once something is created, then you relentlessly certify, in flight, what is being produced. It’s a sort of fact-checking, data-alerting mechanism that’s completely possible with the right framework. Which leads to the next point…
Collect data about patterns of usage
If you’re not analyzing usage patterns, you’re missing valuable data. With all analytics, there are reasons for why (1) a specific set of data is selected and (2) what the user is attempting to do with the data. You can easily keep this metadata in AWS S3 (with a good lifecycle policy) or store for potential later use somewhere else. The point is that if you aren’t understanding *why* then you are only seeing one side of the coin.
Keep everything and then figure out what to do with it and then how to certify it.