Page MenuHomePhabricator

Create a database on the wikireplica servers called "datasets_p"
Open, Stalled, MediumPublic

Description

Create a database on all relevant wikireplica hosts called "datasets_p" that will be used for importing managed Dataset(TM) tables via a process around the soon-to-be-created phabricator tag "wikireplica-datasets" (T173512).

This database will be used to host a subset of the tables that are currently hosted in userDBs on the old labsDB replicas.

Event Timeline

Ottomata moved this task from Incoming to Radar on the Analytics board.Aug 21 2017, 3:33 PM

The ask here as I understand is to provide a database that is co-located with the wiki replicas available via the wikireplica-web.eqiad.wmnet and wikireplica-analytics.eqiad.wmnet service names that can contain N tables of curated data produced by ORES, some analytics project, or other ETL/aggregation methods. These are similar to adhoc tool databases in that they are not canonical MediaWiki metadata, but different in that each table would have a clear owner and a well documented process for how the data it contains is produced and replicated across the pool of hosts that comprise the logical cluster.

I can see the value in this, but I want the DBA team (@jcrespo, @Marostegui) to help design a system that they see as scalable and maintainable. I don't want to see the hard work of T153743: Add and sanitize s2, s4, s5, s6 and s7 to sanitarium2 and new labsdb hosts accidentally undone by adding in new complexity without careful consideration of how it will effect:

  • adding/removing hosts from the pool of servers backing cluster
  • replication lag
  • replication drift
bd808 changed the task status from Open to Stalled.Sep 5 2017, 4:55 PM
bd808 triaged this task as Medium priority.
bd808 added a project: Data-Services.
bd808 moved this task from Backlog to Datasets on the Data-Services board.