Page MenuHomePhabricator

MediaWiki database schemas should have a way to indicate the DB/cluster
Closed, ResolvedPublic

Description

Some extensions want to install some of their tables to a different database or cluster, to share them between multiple wikis (e.g. CentralAuth with $wgCentralAuthDatabase or Flow with $wgFlowCluster / $wgFlowDefaultWikiDb). There isn't currently any way to indicate this in the schema or the installer, so we end up with creative solutions which differ from extension to extension; automated schema updates for these extensions usually don't work at all, or only work if the wiki is configured to store all tables in the main database.

Also, these tables must remain functional during PHPUnit integration tests (even if they aren't tested directly, they might be invoked from hooks that the tests for core or other extensions trigger). The common solution is to use the UnitTestsAfterDatabaseSetup / UnitTestsBeforeDatabaseTeardown hooks to manually create / delete the tables. The process is extremely hacky (for example they don't use the testdb prefix because UnitTestsBeforeDatabaseTeardown would not have access to it, so these tables are created / deleted at their production location) and a source of fragility (see T310255, T314515, T251967 for some examples of things going wrong).

It would be nice to have a proper mechanism to declare the DB and cluster (and be able to configure the installer to ignore it, e.g. for CI). Conversely, there should be a way for a LoadExtensionSchemaUpdates hook handler to tell DatabaseUpdater which cluster / DB to update the schema in.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

A strawman proposal:

  • Introduce the concept of symbolic DB names; these would be things like centralauth_shared or flow_shared.
  • Add a "symbolic DB" field to the JSON DB schema (this won't do anything but is useful or documentation purposes).
  • Add a "symbolic DB" parameter to the methods exposed to LoadExtensionSchemaUpdates.
  • Add a configuration variable that maps symbolic DB names to actual DB and cluster names; omitted symbolic names will be mapped to the default cluster and DB. Over time this might replace $wgCentralAuthDatabase and similar settings.
  • Add an installer option to ignore the map and put everything in the main DB (makes life simpler for CI). Alternatively, make the installer create the extra databases automatically.
  • Update MediaWikiIntegrationTestCase::setupTestDB() to clone all tables defined in the mapping.
  • Add a configuration variable that maps […]; omitted symbolic names will be mapped to the default cluster and DB.
  • […]
  • Add an installer option to ignore the map and put everything in the main DB (makes life simpler for CI).

It looks like this added option wouldn't be needed given how the map is described (empty array by default, all requests use the default/main DB). What am I missing?

If I understand correctly, this task is mainly for the "external database" use case involving a singular central database for the extension's schema (like CentralAuth and Flow). This in constract to the other use case of external DBs in MediaWiki, where the per-wiki schema is logically in the main database but hosted on an "external cluster" in otherwise main/per-wiki databases (like core's ExternalStore text, GrowthExperiments, and parts of Echo).

The latter seems actually more straight forward to take care of during installation and upgrading, because it's unclear which wiki's CLI context should claim ownership over the central database.

Ideally, it should be possible to add new wikis to a wiki farm by just running install.php to create the database. We'll need to set up wiki farms for CI, and we should make site setup easy for people who run small wikifarms. I can see a couple of ways to do that:

  • The extension sets a default value in VirtualDomainsMapping (e.g. CentralAuth sets $wgVirtualDomainsMapping['centralauth'] = [ 'db' => 'centralauth' ];), refers to that DB domain in the schema update hook, and documents that that DB needs to be created. Users will need to create the DB manually (or override $wgVirtualDomainsMapping) otherwise the installer will error out. CI will need extra instructions to create the DB.
  • Same but the installer automatically creates the database when run with the --installdbuser / --installdbpass.
  • We add a prefix option to VirtualDomainsMapping (alongside cluster / db). The default configuration uses a prefix instead of a separate DB; small-scale MediaWiki users are encouraged to use prefixes instead of separate DBs for their wiki farms. Not sure if this would cause an issue with PHPUnit tests which use a custom DB prefix for the test database (but then PHPUnit can't be used to test cross-wiki features anyway).

It looks like this added option wouldn't be needed given how the map is described (empty array by default, all requests use the default/main DB). What am I missing?

I think what I had in mind (but forgot to say out loud) was that extensions which use shared tables do set defaults for the mapping.

If I understand correctly, this task is mainly for the "external database" use case involving a singular central database for the extension's schema (like CentralAuth and Flow). This in constract to the other use case of external DBs in MediaWiki, where the per-wiki schema is logically in the main database but hosted on an "external cluster" in otherwise main/per-wiki databases (like core's ExternalStore text, GrowthExperiments, and parts of Echo).

Yes, the external cluster use case is only relevant to a tiny group of large-scale MediaWiki reusers so I don't think it makes any sense to support that out of the box. Without manual reconfiguration, such tables should just go in the default wiki DB.

it's unclear which wiki's CLI context should claim ownership over the central database.

True but I don't think it's a problem most of the time - the schema update hook handlers already need to idempotent.

Since the MW Platform team owns several extensions which need cross-wiki integration tests, I propose picking this up.

  • We add a prefix option to VirtualDomainsMapping (alongside cluster / db). The default configuration uses a prefix instead of a separate DB; small-scale MediaWiki users are encouraged to use prefixes instead of separate DBs for their wiki farms. Not sure if this would cause an issue with PHPUnit tests which use a custom DB prefix for the test database (but then PHPUnit can't be used to test cross-wiki features anyway).

Isn't "db" in VirtualDomainsMapping already a database domain, which includes "tablePrefix"? Or is "db" meant to only be a DB domain with no slashes (e.g. just "dbName")? I don't see that being documented.

Isn't "db" in VirtualDomainsMapping already a database domain, which includes "tablePrefix"?

What we'd need here is a specified prefix but unspecified database. I guess that can be expressed as a DB domain string, but it isn't very readable.

As I understand it, Amir's work on T330590 has concluded and this is seen as out of scope of that they needed to implement. That leaves this as incremental improvement for primary Rdbms maintainers to pick up as they see fit.

@aaron Gergo is interested in helping with this, but I want to give you and the team a chance first. It might be a useful onboarding task if you're short on some easier ones for Rdbms. If not, Gergo is interested in implementing this as per your description. Would you be able to review in that case?

I don't think API Platform will take this up anytime soon, but I wouldn't mind helping with review.

Change 979400 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@master] installer: Add support for adding schema updates on virtual domains

https://gerrit.wikimedia.org/r/979400

Ladsgroup added a project: DBA.
Ladsgroup subscribed.

I'm doing this as part of WE3.2.1 hypo work.

It's a bit different than original proposal and request but those are already covered in T330590: External LBs should not be exposed to developers

Change 979403 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/extensions/UrlShortener@master] Switch schema updates to use virtual domains

https://gerrit.wikimedia.org/r/979403

Change 979400 merged by jenkins-bot:

[mediawiki/core@master] installer: Add support for adding schema updates on virtual domains

https://gerrit.wikimedia.org/r/979400

Change 985049 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/extensions/OATHAuth@master] Schema: Use virtual domain updater

https://gerrit.wikimedia.org/r/985049

Change 985049 merged by jenkins-bot:

[mediawiki/extensions/OATHAuth@master] Schema: Use virtual domain updater

https://gerrit.wikimedia.org/r/985049

This patch is a showcase of what has happened so far and I think it's good to call this done.

Change 979403 merged by jenkins-bot:

[mediawiki/extensions/UrlShortener@master] Switch schema updates to use virtual domains

https://gerrit.wikimedia.org/r/979403