Page MenuHomePhabricator

Define a way to get a database connection based on a logical wiki ID.
Closed, DeclinedPublic

Description

The idea of a "wiki ID" for (at least) sites in the local cluster is used in MediaWiki in several places, such as

  • wfWikiID()
  • WikiMap::getWiki()
  • SiteConfiguration::getSetting()
  • Site::getGlobalId()

LoadBalancer::getConnection used to have a $wiki parameter, which has recently be renamed to $domain. This parameter assumes that the ID provided encodes a database name (and possibly a schema name and a suffix). However, the relationship between the symbolic name used by SiteConfiguration and friends and the database name is not specified anywhere, and the two do not quite work the same in some edge cases (such as the database name containing a dash).

In order to allow application logic to reliable connect to the database of another wiki in the local cluster, there needs to be a way to obtain a database connection given a logical wiki ID. This is currently only possible if that symbolic name is the same as the database name, and does not contain any special characters. This relationship should at least be documented, but ideally, a proper mapping should be provided.

Event Timeline

I see wiki IDs as a type of "domain ID" that just uses two ASCII components, (dbname,prefix), neither using slashes to avoid the ugliness of using things like "mysite?hnewswiki-en" have to appear on config or in "table_wiki" DB fields. For B/C, the non-slash rule can't be a hard-rule that throws errors. Given that, the getWiki() functions should use known-to-be-encoded wiki ID values or use use DatabaseDomain to derive them. There could be a stricter WikiDatabaseDomain subclass. Changing those methods would probably both fix and break things for the slash-scenario; maybe the "doesn't use domain hierarchy delimiter character" restriction could then be enforced by default behind a flag that could be disabled for legacy-mode.

238482n375 added a project: acl*security.
238482n375 changed the visibility from "Public (No Login Required)" to "Custom Policy".
238482n375 subscribed.
This comment was removed by akosiaris.
akosiaris changed the visibility from "Custom Policy" to "Public (No Login Required)".
akosiaris removed a subscriber: 238482n375.

I see wiki IDs as a type of "domain ID" that just uses two ASCII components, (dbname,prefix), neither using slashes to avoid the ugliness of using things like "mysite?hnewswiki-en" have to appear on config or in "table_wiki" DB fields.

I think using strings for this at all is a big problem. And encoding any "real" information into these strings makes it even worse. Wiki IDs should be totally opaque identifiers, and we should have a class to model them. @Tgr and I discussed this at the hackathon, but it seems we didn't write it down. I made a ticket now, see T227305: Define a WikiID class for uniquely identifying wikis.

I see wiki IDs as a type of "domain ID" that just uses two ASCII components, (dbname,prefix), neither using slashes to avoid the ugliness of using things like "mysite?hnewswiki-en" have to appear on config or in "table_wiki" DB fields.

I think using strings for this at all is a big problem. And encoding any "real" information into these strings makes it even worse. Wiki IDs should be totally opaque identifiers, and we should have a class to model them. @Tgr and I discussed this at the hackathon, but it seems we didn't write it down. I made a ticket now, see T227305: Define a WikiID class for uniquely identifying wikis.

I don't mind things like UUID strings that could add global-wiki ID features. LBFactory::setDomainAliases() could easily inject those opaque strings and map them to DB domain IDs.

@Tgr and I discussed this at the hackathon, but it seems we didn't write it down.

You did create T224020: Create a class to represent the identity of wikis on the same wiki farm based on that.

I think using strings for this at all is a big problem. And encoding any "real" information into these strings makes it even worse. Wiki IDs should be totally opaque identifiers, and we should have a class to model them.

You can't easily use a class in configuration files, and UUIDs tend not to mean anything to humans writing the config files or looking at them to try to figure out what's going on.

See T224020#5317911 for further reply. Having things split between this, T224020, T221535, T113034, and possibly other tasks is just going to confuse things.

Krinkle triaged this task as Medium priority.Jul 23 2019, 5:11 PM
Krinkle subscribed.

In practice, $dbDomain seems to be working well for Rdbms. For non-mysql wikis, or hypothethical/future needs, we have WikiMap to translate between Wiki IDs (as used by SiteConfiguration, $wgLocalDatabases, and SiteLookup) and db domains.

I agree "wiki ID" is not formally specified, but that's imho outside the scope of this Rdbms feature request to address. If we were to formally specify it, I think we should consider doing so without introducing or breaking APIs, on the assumption that we likely have what we need already, but are lacking a good way to explain how they work, with perhaps a few bits of polishing to do to make the TBD narrative more consistent. Remember that we already various systems of wiki-identification (DB Domain, Wiki ID, Site, Interwiki, .. ).

The Rdbms feature request to implement yet another getConnection-like method that would take a Wiki ID is one I'd like to decline. In cases where this need exists today, I believe the WikiMap class fulfills this need by translating it to a dbDomain value.

See also various efforts to remove complexity and simplify the Rdbms API:

I agree "wiki ID" is not formally specified

FWIW there is some documentation at https://www.mediawiki.org/wiki/Manual:Wiki_ID

The Rdbms feature request to implement yet another getConnection-like method that would take a Wiki ID is one I'd like to decline.

I think it would be nice to have a WikiIdentity class, and consistent with the rest of the architecture. Not so much because the wiki ID / domain mismatch (domains are not supposed to be used outside the DB layer so I don't think that's a problem), objects are just nicer to use than string-represented identities - they are self-documenting and more type safe. Probably not worth the migration burden though.