Problem: Currently information about tables we have in production are scattered (e.g. list of private tables) or non-existent (asking questions on "what is pif_edits table and what it's been doing in our production?"). This is also hindering our ability to automate more work (between-replicas comparison checks, over-arching drift tracking, etc. etc.).
Proposal: Introduce a file in puppet containing information on tables. Something like this:
version: 1 tables: - name: abuse_filter source: abusefilter canonicality: canonical visibility: partially public - name: urlshortcodes sections: - x1 dbs: - wikishared source: urlshortener canonicality: canonical visibility: public - name: flaggedtemplates dblist: flaggedrevs.dblist status: dropped canonicality: partially canonical visibility: public sources: abusefilter: - path: extensions/AbuseFilter/db_patches/tables.json gerrit: https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/AbuseFilter/+/refs/heads/master/db_patches/tables.json
Then this could become the foundation for many usecases and replacing ad-hoc table listings.
It also could be easily pull in to doc.wikimedia.org hosts and served in a more user-friendly way (search, filtering, all that jazz).
Notes:
- This will grow way too large, to avoid that:
- MediaWiki and non-MW tables should have their own catalogues. That way we can also start with mediawiki and then move forward with non-mw tables in a different dedicated file.
- We need a lot of defaults and if omitted, it means "the default". e.g. dblist = all, sections: s1-s8, db: wiki's db, status: active
- Part of reviewing and approving new tables should be adding the new table to the catalogue
Comments?