Problem: Currently information about tables we have in production are scattered (e.g. list of private tables) or non-existent (asking questions on "what is pif_edits table and what it's been doing in our production?"). This is also hindering our ability to automate more work (between-replicas comparison checks, over-arching drift tracking, etc. etc.).
Proposal: Introduce a file in puppet containing information on tables. Something like this:
version: 1 tables: - name: abuse_filter pk: af_id source: abusefilter canonicality: canonical size: small visibility: partially public - name: urlshortcodes pk: usc_id sections: - x1 dbs: - wikishared source: urlshortener canonicality: canonical size: small visibility: public - name: flaggedtemplates dblist: flaggedrevs.dblist status: dropped canonicality: partially canonical size: large visibility: public sources: abusefilter: - path: extensions/AbuseFilter/db_patches/tables.json gerrit: https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/AbuseFilter/+/refs/heads/master/db_patches/tables.json
Then this could become the foundation for many usecases and replacing ad-hoc table listings.
It also could be easily pull in to doc.wikimedia.org hosts and served in a more user-friendly way (search, filtering, all that jazz).
Notes:
- This will grow way too large, to avoid that:
- MediaWiki and non-MW tables should have their own catalogues. That way we can also start with mediawiki and then move forward with non-mw tables.
- We need a lot of defaults and if omitted, it means "the default". e.g. dblist = all, sections: s1-s8, db: wiki's db, status: active
- We probably could drop some options like size (it's also subjective)
- Part of reviewing and approving new tables should be adding the new table to the catalogue
Comments?