Page MenuHomePhabricator

Create a database on the wikireplica servers called "datasets_p"
Closed, DuplicatePublic

Description

Create a database on all relevant wikireplica hosts called "datasets_p" that will be used for importing managed Dataset(TM) tables via a process around the soon-to-be-created phabricator tag "wikireplica-datasets" (T173512).

This database will be used to host a subset of the tables that are currently hosted in userDBs on the old labsDB replicas.

Event Timeline

The ask here as I understand is to provide a database that is co-located with the wiki replicas available via the wikireplica-web.eqiad.wmnet and wikireplica-analytics.eqiad.wmnet service names that can contain N tables of curated data produced by ORES, some analytics project, or other ETL/aggregation methods. These are similar to adhoc tool databases in that they are not canonical MediaWiki metadata, but different in that each table would have a clear owner and a well documented process for how the data it contains is produced and replicated across the pool of hosts that comprise the logical cluster.

I can see the value in this, but I want the DBA team (@jcrespo, @Marostegui) to help design a system that they see as scalable and maintainable. I don't want to see the hard work of T153743: Add and sanitize s2, s4, s5, s6 and s7 to sanitarium2 and new labsdb hosts accidentally undone by adding in new complexity without careful consideration of how it will effect:

  • adding/removing hosts from the pool of servers backing cluster
  • replication lag
  • replication drift
bd808 changed the task status from Open to Stalled.Sep 5 2017, 4:55 PM
bd808 triaged this task as Medium priority.
bd808 added a project: Data-Services.
bd808 moved this task from Backlog to Datasets on the Data-Services board.