The goal of documenting data is to help users to find it, understand it, and be confident to use it.
But you don’t need to document everything. Well-modeled data is often self-explanatory.
“Good code documents itself” contains a good amount of truth. You should aim to have expressive and consistent names that are self-descriptive. A database called ‘dbo’ is not helpful, ‘sales_prod’ is better, especially if there is also a ‘sales_dev’ and a ‘finance_prod’ database.
So documenting starts with naming things, but it does not stop there.
Use the explicit hierarchy of the database system to build top-down documentation
With that users will start to understand the big picture and will be able to navigate your data landscape themselves.
What to focus on when documenting your data
Document the top 3 levels (system, database, schema) completely. Focus on documenting the top 10% of the most used tables. Establish a process that all new tables/views/models should be created with at least minimal documentation (during the creation, it is the easiest to document).
Documenting all columns is usually only worth it for data products or widely used reporting tables. But for these, you should be rigorous. If a column is not worth documenting, it should not be part of the table.
In practice, it can be hard to choose the right words. Should I refer to customer or account or company or user or site? Does everybody understand the acronyms we use in our team?
To tackle challenges like these your documentation system ideally supports you with a glossary, where you can define important terms once and reference them in the documentation.
Tips
- Use expressive and consistent names;
- Document top-down and most used;
- One sentence is usually enough;
- Make documentation part of the development process;
- Use #definitions in a business glossary;