Now that the new replica servers are available (T172704) and userdb creation is disabled, we need a solution to make common datasets available.
Straw dog proposals
Proposal A: single database, many table owners
- Single database named datasets_p created hosting N tables of curated data.
- Single authoritative primary server (tools.db.svc.eqiad.wmflabs?) where datasets are loaded/updated into the datasets_p database.
- Each table must have an 'owner' in the form of one or more individuals who are responsible for support and maintenance of the data. Complete details of this responsibility to be determined.
- Each table subject to the replication criteria described below.
Proposal B: many databases
- Public *_p user-owned databases created on tools.db.svc.eqiad.wmflabs following normal processes.
- Database owner files Phabricator task to request replication to Wiki Replicas hosts.
- Each table subject to the replication criteria described below.
Replication criteria
- Table schema, content appropriateness for public distribution, and data set size must be approved by cloud-services-team , DBA, and Security-Team/WMF-Legal teams before initial load/replication.
- Data stored as InnoDB compressed tables.
- Each table must have unique primary key for each row. This primary key may be an auto increment integer.
- Authoritative database replicated to labsdb100{9,10,11} using:
Example datasets: