Page MenuHomePhabricator

Create a mechanism for loading configuration for sister site
Open, LowPublic

Description

We have an established pattern for creating MW service objects tied to a foreign wiki - we create a factory, create a correct LoadBalancer, and eventually create instances of a service tied to the database on a foreign wiki.

However, the config provided into the services is instantiated from MWServices::getMainConfig - thus it belongs to a local wiki. The config is just one of the dependencies we supply in, technically when creating a service for a foreign wiki, ALL it's dependencies must belong to that foreign wiki. However, I'm singling out the config as the most obviously wrong one.

I am not aware of any real issues this disparity creates, but I could imagine a whole cluster of hard-to-find bugs related to the mismatch. However, it seems like the factory pattern applied to foreign wikis is getting more and more adoption, thus we need to discuss the implications of it.

Ideally, when obtaining a service for a foreign wiki, we should obtain a whole service container belonging to a foreign wiki. The service wiring code then would get the dbDomain from the service container, and apply it to all the services it creates. That would guarantee that if we are creating a service for the foreign wiki, the config is correct, and all the dependencies are also looking into a correct database.

One important piece of the puzzle we do not have right now is the ability to quickly load the config for another domain.

Event Timeline

My take on this is:

The "right" way to do this is to have a separate instance if MediaWikiServices instance for the other wiki we want to access. It would have the config for that wiki loaded. You could get a MediaWikiServices instance for a target wiki from a WikiFarmManager (name?) service. That service would itself be part of the "main" MediaWikiServices instance - and should perhaps even be shared by all MediaWikiServices instances.

This would Just Work (tm) if all service objects were perfectly isolated from global state. Since this is not the case, we could have a marker interface that services could use to indicate that they are safe to use for accessing other wikis. A MediaWikiServices instance configured for a wiki other than the local one would refuse the return service instances that lack this marker interface.

A related question is how we would identify wikis, see T184529.

Pinging @Tgr, since we have discussed this before.

This sounds like the kind of thing that doesn't matter very much, until it does. Then it matters a lot, and is difficult to fix quickly.

There is a whole stack of problems here:

  1. We have started transitioning (five years ago; it was one those "do the first 20% then drop it and start working on something else" efforts) towards configuration objects, but plenty of things still use global variables for configuration (including most of the logic in operations/mediawiki-config). The Config object you get from the service container is just a facade for querying globals. Obviously multiple instances of configuration cannot live alongside if the configuration is stored in globals, so this transition would have to be wrapped up first. @DannyS712 has done a great job of keeping that project rolling recently, maybe he has a sense of how much work is left.
  2. There is no interface for accessing the configuration; MediaWiki simply does not have a concept of "site configuration". There is Config which is just magically assumed to exist (and is, in practice, using globals, with no care of how they got there). There is SiteConfiguration, but it's optional, a given wiki might or might not use it, how to use it is largely unspecified, and most configuration settings (exactly which ones is left as an implementation detail) are only available as a callback, which is assumed to be slow. (For Wikimedia sites, that runs a maintenance script which loads configuration for another site as a standalone request, and dumps it as JSON to be parsed back by the caller.)
  3. Wikimedia configuration is built around the concept of settings as global variables. A few things which are commonly used cross-wiki (like URLs and database names) are loaded into SiteConfiguration in a cross-wiki-available manner, but most settings are just globals. It would have to be rewritten significantly to interact with a config object instead of interacting with globals.

The next version of code sniffer should flag globals that are used when a context is available (T257710: Add sniff to disallow new global variable imports in functions (using global keyword)) - that may help with ensuring that configuration values are retrieved from an available context rather than from global variables. Now we just need to ensure that the context returns the right config :)

daniel renamed this task from MW service factories supply local wiki config into services created for foreign wikis to Create a mechanism for loading configuration for sister site.Jan 28 2021, 9:38 AM

It should be relatively simple to create an implementation of the Config interface that just wraps a SiteConfiguration instance. Config::get() would map to SiteConfiguration::get. The ID of the target wiki would be given in the constructor. That should work, right?

It should not, for the reasons given above. SiteConfiguration::get just takes the <setting> => <wiki/tag> => <value> hash in SiteConfiguration::$settings, which in the WMF case is the output of InitializeSettings.php, and looks up the given wiki in it. That works sometimes; it doesn't work for anything set in the any number of other configuration files we are using, like CommonSettings.php. And it won't affect globals; plenty of configuration is made available as globals and is not exposed to SiteConfiguration at all.

SiteConfiguration::get just takes the <setting> => <wiki/tag> => <value> hash in SiteConfiguration::$settings, which in the WMF case is the output of InitializeSettings.php, and looks up the given wiki in it. That works sometimes; it doesn't work for anything set in the any number of other configuration files we are using, like CommonSettings.php.

Ok, so if SiteConfiguration only has the special stuff, is it safe (enough) to assume that the other settings would be the same in the local wiki? I'm thinking of something like this:

$config = new MultiConfig( [ $mainConfig ], new CrossWikiConfig( $wgConf, 'foowiki' ) );

We are a long way from having this work e.g. enabling different extensions, but for looking up extra namespaces and content langauge, would this be sufficient?

And it won't affect globals; plenty of configuration is made available as globals and is not exposed to SiteConfiguration at all.

That's a feature here. This kind of Config would only be used by service objects that don't use globals directly. And $mainConfig would be looking at these globals, so they would work as a fallback.

Ok, so if SiteConfiguration only has the special stuff, is it safe (enough) to assume that the other settings would be the same in the local wiki?

Depends on what is enough. It will work exactly for the settings which are set in InitialiseSettings and not modified in any other way. E.g. this wouldn't work - even though all the wmg variables are properly split by site, $wgLogos is only set once and SiteConfiguration has no knowledge of it. This would not work either, even if you'd somehow put the result back into SiteConfiguration (which is not done currently but would be doable) - which branch is taken is based on the current wiki, so SiteConfiguration would always include the adjustments for the current wiki, no matter what wiki parameter you give it.

SiteConfiguration::getConfig() can get the accurate values of settings but 1) it is slow (makes a cross-wiki call) and 2) it needs to know the names of the settings since it is reading the corresponding globals and 3) does not handle configuration settings which are not globals, although that should be fixable.

We are a long way from having this work e.g. enabling different extensions, but for looking up extra namespaces and content langauge, would this be sufficient?

Language code is defined in InitialiseSettings and not manipulated elsewhere so that would work.
Namespace gets modified in CommonSettings (e.g.) so that wouldn't.

This is worse than I thought. Ok, so how about this:

Could we just generate the effective settings for each wiki in a separate file during deployment, and then just load the right file for a given wiki at runtime? That would be faster, would make it easy to load config for another wiki, and would allow us to keep the current convoluted logic for the most part.

Could we just generate the effective settings for each wiki in a separate file during deployment, and then just load the right file for a given wiki at runtime?

AIUI we are doing that already (see the MWConfigCacheGenerator calls in CommonSettings). @Jdforrester-WMF knows a lot more about this. It is certainly better performance-wise than shelling out to getConfiguration.php (which is what SiteConfiguration::getConfig() does) but it has the same limitations (maybe a bit more due to the timing of when it's called): how do you know what config variables are there? AIUI MWConfigCacheGenerator basically takes the array keys of InitializeSettings. So while it can deal with the values being manipulated outside that file, it it cannot detect settings that are defined elsewhere. getConfiguration.php also filters globals with a regex (starts with wg or wmg) so it covers more, but still not everything.

As long as you only want it to work for some predefined, enumerable set of variables, I think the MWConfigCacheGenerator approach could be made to work. The longer-term approach, I think, would be to convert CommonSettings & co into a closure with no side effects which takes the dbname and operates on some config object instead of $GLOBALS, which would be a big migration but not unmanageable. And then it could be properly invoked cross-wiki / written out to a file for speed.