Page MenuHomePhabricator

<Tech Initiative> Improving Databases in MediaWiki
Open, Needs TriagePublic

Description

Request Status: New Request
Request Type: project support request
Related OKRs: TBD

Request Title: Improving Databases in MediaWiki

  • Request Description: In the past year, the majority of general outages have been caused by databases and if we mediawiki continues to stay like this, it will be more and more. Beside that, basically any major increase in data of Commons is also not possible and will cause major issues.
  • Indicate Priority Level: High
  • Main Requestors: @Ladsgroup @LSobanski
  • Ideal Delivery Date: Next quarter
  • Stakeholders: @WDoranWMF @Marostegui

Request Documentation

Document TypeRequired?Document/Link
Related PHAB TicketsYesInside the ticket
Product One PagerYes<add link here>
Product Requirements Document (PRD)Yes<add link here>
Product RoadmapYes<add link here>
Product Planning/Business CaseNoDocument
Product BriefNo<add link here>
Other LinksNo<add links here>

Related Objects

Mentioned In
T362574: Create a Query Builder for IDatabase::deleteJoin or remove the function
T362571: Create a Query Builder for IDatabase::insertSelect or remove the function
T330640: Migrate Database::update usages to UpdateQueryBuilder
T253462: Allow aspects of edits or logs to be revdeled while other aspects are suppressed
T222224: RFC: Normalize MediaWiki link tables
Mentioned Here
T255493: Consider phasing out ILoadBalancer::getLazyConnectionRef in favour of ILoadBalancer::getConnectionRef
T286694: Drop legacy cruft arising from introduction of ResultWrapper
T296960: Remove unused or barely used functions of IDatabase
T299392: Remove unused or barely used functions of ILoadBalancer and ILBFactory
T307616: Move SQL building code from Database class to SQLPlatform
T311866: Migrate Database::select usages to SelectQueryBuilder (in WMF-deployed extensions)
T20493: RFC: Unify the various deletion systems
T28741: Migrate file tables to a modern layout (image/oldimage; file/file_revision; add primary keys)
T63111: Convert primary key integers and references thereto from int to bigint (unsigned)
T183490: MCR schema migration stage 4: Migrate External Store URLs (wmf production)
T233004: Update CheckUser for actor and comment table
T241053: Normalize globalimagelinks table
T243051: A query builder for MediaWiki core
T275246: Populate rev_actor and rev_comment_id
T296380: flaggedtemplates table is still too big
T299417: Normalize templatelinks table
T299691: Break down monster class: Database
T299947: Normalize pagelinks table
T299951: Normalize categorylinks table
T299953: Normalize imagelinks table
T299954: Write code for handing write and read of rev_comment_id

Event Timeline

Based on an initial review done, this is likely to be over a year of effort.

Discussion Topics for Tech Steering Committe:

  • We will need to review to see where this falls in priority for this fiscal year, given that we are already oversubscribed.
  • If we do prioritize, should we ideally look to finish anything of the list that has been started first, before we proceed with the recommended priority order, that way we have less WIP and open items.

January 19, 2022 Tech Steering Committee:

  • Kate C.: team reviewed this request and Tim and Timo felt that this has performance implications so it is reasonable to support it, but estimated at least 1 year of effort.
  • Ideal to prioritize list by quarter and start with the currently open items: https://docs.google.com/document/d/1oc0xbZ5L7xgK9o1jEg1SsAWO3lK4Z176Mz3VfeoNwxI/edit
  • Mark B.: open to working this way and break down into chunks
  • Kate C. - some migrations can take months and months to run, while others need active work

Action Items:

  • Mark B.: will look at breaking down by quarter and closing open items, also triage by risk
  • ask Tim and Amir to break-down
DAbad renamed this task from Improving Databases in MediaWiki to <Tech Initiative> Improving Databases in MediaWiki.Jan 20 2022, 6:51 PM

I was asked to break this down to work that can be done in a quarter. It depends on the number of people being put on this but I give it a try here:

Q3 (including some ongoing work)

Q4:

Q1:

Q2:

Q3:

Q4:

This is very rough and ofc will change according to new stuff coming up.

Another way of seeing it is through lens of three value mini-streams: revision metadata, links normalization, rdbms code quality.