Page MenuHomePhabricator

Instantiate MaintenanceScripts after extensions have registered with the autoloader
Closed, ResolvedPublic

Description

In order to create runner for maintenance scripts that also works for scripts defined by extensions, extensions need to be able to register classes with the autoloader before we try to instantiate the Maintenance script object. This would also allow extensions to register alias names for scripts.

This is however tricky because of the sequence of initialization in Setup.php, and how maintenance scripts and extensions interact with it.

The requirements for the sequence are:

  • ExtensionRegistry::loadFromQueue() must be called after $wgSettings->finalize(), because extensions need to back-fill defaults, and add stuff to config variables. This is expected to happen last.
  • MaintenanceRunner::init needs to happen after ExtensionRegistry::loadFromQueue(), because only at this point will the extensions have registered their classes with the autoloader.
  • MaintenanceRunner::overrideConfig has to be called before $wgSettings->finalize(), because it changes config settings. It must be called after MaintenanceRunner::init, because otherwise there is no Maintenance object.

So we have MaintenanceRunner::init < overrideConfig < $wgSettings->finalize < loadFromQueue(), but also loadFromQueue() < MaintenanceRunner::init, which is a contradiction. Something's got to give.

Proposed solution: move the config update logic out of ExtensionRegistry::loadFromQueue(), but keep the autoloader logic in it. Create a new method ExtensionRegistry::applyConfig or some such that applies changes to configuration (and also runs the extension callbacks).

Then we can do:

  1. ExtensionRegistry::loadFromQueue()
  2. MaintenanceRunner::init and MaintenanceRunner::overrideConfig (via MW_SETUP_CALLBACK)
  3. $wgSettings->finalize
  4. ExtensionRegistry::applyConfig

Basically, we call loadFromQueue() earlier, before MW_SETUP_CALLBACK; but at that time we only apply the extensions autoloader integration. Other effects of loading extensions are applied later, after $wgSettings->finalize, at the point we used to call loadFromQueue().

Event Timeline

Change 818574 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[mediawiki/core@master] ExtensionRegistry: split exportExtractedData

https://gerrit.wikimedia.org/r/818574

Proposed solution: move the config update logic out of ExtensionRegistry::loadFromQueue(), but keep the autoloader logic in it

Uhhh...this just seems like it would make T240535: Clean up ExtensionRegistry autoloading mess worse.

Can we load all config before the Maintenance class is even created? Having the constructor run before config is set has actually been a huge source of confusion, leading to at least one significant data loss: rESPOdf4053b492be: Don't try to read globals from PurgePrivateVoteData::__construct() (can't find the task for it at the moment, the gist was the global was unset at __construct, so null was cast to 0, so the script deleted all data once it was 0 days old and deleted all the votes before the election could be tabulated, oops).

@daniel wrote in task description:
  • MaintenanceRunner::overrideConfig has to be called before $wgSettings->finalize(), because it changes config settings. It must be called after MaintenanceRunner::init, because otherwise there is no Maintenance object.

It does not say here why MaintenanceRunner implies a Maintenance objects. These are different things.

For web entry points we have e.g. index.php > WebStart > Setup > handle route via Action or SpecialPage class.
For maintenance scripts we had script > doMaintenance > Setup > Maintenance subclass.

As I understand it, MaintenanceRunner here is a refactoring of doMaintenance, which should be able to inject any config overrides via MW_SETUP_CALLBACK. As a general direction, I'd like to reduce the amount of code in overrideConfig and certainly not invest in or encourage more use. (Some of it could even be inlined into Setup.php based on more generic CLI-mode detection, if not specific to Maintenance classes as such but more the general lack of web context.)

The reason I'm raising this as an issue and suggest a change, is because it would remove the rather significant splitting and compounding of logic proposed in change 818574. I have no particular objection to that complex patch, other than that as a general rule it'd good to avoid complexity so I'm trying to to test whether it meets a necessity threshold.

It's hard to find anything that really needs to be in overrideConfig or finalSetup. In the PHPUnit tests, config adjustment is done after setup completes, which mostly seems to work as long as you do a service reset. But a lot of instances of config adjustment could be replaced by setting a property in a service. For example, dumpIterator.php has

global $wgHooks;
$wgHooks['InterwikiLoadPrefix'][] = 'DumpIterator::disableInterwikis';

That pattern is meant to be deprecated in favour of HookContainer::register().

$settingsBuilder->putConfigValues( [
	MainConfigNames::UseDatabaseMessages => false,
	MainConfigNames::LocalisationCacheConf => [ 'storeClass' => LCStoreNull::class ],
] );

That's equivalent to MessageCache::disable() followed by LocalisationCache::disableBackend().

That is to say, I agree with the comments above, and I'd be happier if we had a plan for getting rid of Maintenance weirdness, rather than finding ever more complex ways to support it.

The problem I'm trying to solve here is that MaintenanceRunner can't instantiate a MaintenanceObject defined by an extension before config has been loaded and extensions have been initialized. If we try to do that before instantiating the Maintenance object, then the call to finalSetup() on the maintenance script would only happen after config loading is complete, and config can no longer safely be changed.

While I understand the concern about added complexity, I still think that it conceptually makes sense to split extension loading into two phases: one that allowes extensiuon code to run (that is, registering autoloading), and one that then actually hooks up the extension code. This allows all extensions to be "ready" and information about hooks and config variables before starting to "hook up" code.

To respond to some of the comments above:

Uhhh...this just seems like it would make T240535: Clean up ExtensionRegistry autoloading mess worse.

I think that is mostly fixed now by https://gerrit.wikimedia.org/r/c/mediawiki/core/+/773514

Can we load all config before the Maintenance class is even created? Having the constructor run before config is set has actually been a huge source of confusion

Yes, the idea is that the MaintenanceRunner causes the config to be loaded, and then later instantiates the actual Maintenance object, once MediaWiki has been fully initialized. The issue is that we need to call finalSetup() on the Maintenance object before we finalized config and allow services to be created.

As I understand it, MaintenanceRunner here is a refactoring of doMaintenance, which should be able to inject any config overrides via MW_SETUP_CALLBACK.

Yes, a refectoring, plus an attempt to fix some issues with the current sequence of initialization. In particular, we want the ability to load classes defined by extensions, by name, from an entry point defined by core; And we need these classes to then interact with SettingsBuilder before initialization of MediaWiki is complete.

As a general direction, I'd like to reduce the amount of code in overrideConfig and certainly not invest in or encourage more use.

I agree, but looking at all the maintenance classe that override finalSetup() it seems like we will need some way for them to manipulate config settings, and do so before all extensions become "active" and the service container becomes available.

It's hard to find anything that really needs to be in overrideConfig or finalSetup. In the PHPUnit tests, config adjustment is done after setup completes, which mostly seems to work as long as you do a service reset. But a lot of instances of config adjustment could be replaced by setting a property in a service.

Allowing services to be re-configured after initialization is complex makes the system very hard to reason about.

You are correctly pointing to one of the key ideas that drive this change: configuration should be final before services get instantiated, and the configuration of services doesn't change after they have been returned from the ServiceContainer. Methods on service objects that manipulate properties (config) of these services, if they exist at all, should only be called in wiring.

Following along with the idea that finalSetup() should be called later, so it can manipulate services, I think what we'd need to a phase of initialization where services are not yet usable, but available for tweaking. That makes me uneasy...

Change 818574 abandoned by Daniel Kinzler:

[mediawiki/core@master] ExtensionRegistry: split exportExtractedData

Reason:

Going with I6f8f9f3f7252f0024282d7b005671f28a5b3acc3 instead

https://gerrit.wikimedia.org/r/818574

Ladsgroup assigned this task to daniel.