Page MenuHomePhabricator

Move dynamic configuration defaults out of Setup.php
Open, Needs TriagePublic

Description

Currently, Setup.php implements dynmiac default values for various config settings by manipulating global variables (see the part of the file that starts with the comment "Expand dynamic defaults and shortcuts"). Since we want to move away from having configuration in global variables, this should be changed. There are several cases to consider:

  1. Some config variables are used only in a single class. For these, the computation of the default value can be moved into the code that uses the configuration (in service wiring or actual application logic). E.g. seems to be used only in ServiceWiring, in the instantiator function for MainWANObjectCache. And ForeignFileRepos is only used in FileBackendGroup.
  2. Config variables that are used in multiple places should be initialized by a separate class, called something like DynamicConfigDefaults. It would have an applyDefaults() method that takes a SettingsBuilder. On the SettingsBuilder it calls getConfig() to get the configuration to inspect. It then computes an array that contains all the defaults to set. Finally, it calls overrideConfigValues() on the SettingsBuilder.
  3. Some Config variables are currently set late in Setup.php, after MediaWikiServices::allowGlobalInstance() has been called. These need special consideration, since the pose a paradox: they need access to a service object in order to determine their value. But configuration should not be changed after the MediaWikiServices has been initialized, since we don't know if aservice that used that config setting has already been created. Also, setting the variables after the service container has been created will stop working once we move away from using globals. The solution may be to move the determination of the default into the service wiring code that constructs the respective service.

Prior art:

Event Timeline

Aklapper renamed this task from Move dynmaic configuration defaults out of Setup.php to Move dynamic configuration defaults out of Setup.php.Mar 30 2022, 6:44 PM

Change 776362 had a related patch set uploaded (by Simetrical; author: Simetrical):

[mediawiki/core@master] Don't make dynamic config defaults globals-only

https://gerrit.wikimedia.org/r/776362

Change 776951 had a related patch set uploaded (by Simetrical; author: Simetrical):

[mediawiki/core@master] Remove some dynamic config from Setup

https://gerrit.wikimedia.org/r/776951

Change 776991 had a related patch set uploaded (by Simetrical; author: Simetrical):

[mediawiki/core@master] Refactor URL-parsing global functions to service

https://gerrit.wikimedia.org/r/776991

Change 776951 merged by jenkins-bot:

[mediawiki/core@master] Remove some dynamic config from Setup

https://gerrit.wikimedia.org/r/776951

So it occurred to me -- what's the purpose of having a separate class to compute dynamic defaults? Wouldn't it make more sense to put this information in includes/MainConfigSchema.php by allowing 'default' to be a callable that will be executed and return the default value based on the other values that have so far been configured? Of course you'd have to make sure to call the callbacks in the right order, but it seems like a much better solution than pushing this logic all off to a separate class. Dynamic defaults would then be set when SettingsBuilder::finalize() was called, or something like that.

So it occurred to me -- what's the purpose of having a separate class to compute dynamic defaults? Wouldn't it make more sense to put this information in includes/MainConfigSchema.php by allowing 'default' to be a callable that will be executed and return the default value based on the other values that have so far been configured?

I get the idea, but there are a number of complications. One is the order of execution, as you mentioned. Another is the fact that constants cannot contain closures. The callable would have to be the name of a static function, which we would then have to distignuish from a simple string.

We could have an additional key in the schema, something like "initializer" or "interpolation" or something, that points to the name of a static function that also lives in MainConfigSchema. That would also avoid the issue of filtering out callbacks from the defaults when generating the JSON Schema in docs/config-schema.yaml.

For somple cases, I find this idea attractive. But as soon as the initializer function has to call out to other code outside the class, I'd be against it. The schema should be a static, self-contained thing.

Another thing to worry about is how the config-set-so-far is represented. You encountered this complication in your code as well, with the need to check the base defaults and the overrides set so far. We'd still have this problem here. As long as we need to support config values backed by globals, I don't see a good way to dimplify this.

Of course you'd have to make sure to call the callbacks in the right order, but it seems like a much better solution than pushing this logic all off to a separate class. Dynamic defaults would then be set when SettingsBuilder::finalize() was called, or something like that.

In the end, it well me methods in some class. Perhaps it can be static methods in MainConfigSchema. I'm not totally against that, but I'm not clear on the signature of these emthods, and the issue of execution order and representing the intermediate state remains.

The following are set by default to the value of another setting:

  • ResourceBasePath
  • SharedPrefix
  • SharedSchema
  • FileBlacklist
  • MimeTypeBlacklist
  • EnableUserEmailBlacklist
  • ShortPagesNamespacesBlacklist
  • DBerrorLogTZ
  • SlaveLagWarning
  • DatabaseReplicaLagWarning
  • SlaveLagCritical
  • DatabaseReplicaLagCritical

The following are set by default to the value of another setting concatenated with a constant string:

  • Script
  • LoadScript
  • RestPath
  • StylePath
  • LocalStylePath
  • ExtensionAssetsPath
  • UploadPath
  • UploadDirectory
  • ReadOnlyFile
  • FileCacheDirectory
  • DeletedDirectory

Defaults that depend on other settings plus expansion of URLs:

  • CanonicalServer
  • EmergencyContact
  • PasswordSender
  • NoReplyAddress

Others:

  • UsePathInfo: An expression involving constants, strpos, and logical/comparison operators
  • ArticlePath: String concatenated to another setting's value, but the string depends on a third setting.
  • Logo: An expression involving other settings, operators, and isset.
  • MetaNamespace: str_replace() on another setting
  • MainWANCache/WANObjectCache: A little weird, I'm not sure what the intent of the code is.
  • ShowEXIF: function_exists()
  • GalleryOptions: Constant default, not sure why this can't be initialized statically. Maybe because people will use the + operator incorrectly to initialize it and their settings won't work.
  • LocalFileRepo: Array based on simple manipulation of lots of other settings.
  • CookiePrefix: Simple manipulation of a few other settings.
  • LocalTimezone: date_default_timezone_get()
  • LocalTZoffset: Arithmetic based on date()
  • CookieSecure: Based on ForceHTTPS and WebRequest::detectProtocol() (which is just analysis of $_SERVER)
  • TmpDirectory: wfTempDir(), which boils down to only built-in functions.

Manipulations that are done other than setting defaults:

  • ProhibitedFileExtensions, MimeTypeExclusions: array_merge() with a legacy name for the setting if that was defined (thus preserving the entries of both).
  • EnableUserEmailMuteList, ShortPagesNamespaceExclusions: Overwritten with the value of the legacy name for the setting it that was defined (so the legacy name takes precedence if both are defined).
  • FileExtensions: array_values( array_diff() ) with another setting.
  • RightsIcon: str_replace() using the values of other settings.
  • FooterIcons: Conditional additions and modifications to the array based on the value of other settings.
  • NamespaceProtection: Overwrite a fixed entry with a fixed value.
  • LockManagers: Add two entries, one constant and one based on another setting.
  • ForeignFileRepos: Adjusted based on simple manipulation of lots of other settings.
  • DefaultUserOptions: Adjusted based on simple manipulation of other settings.
  • CookiePrefix: strtr()
  • Lots of e-mail settings: Set based on the value of EnableEmail (overwriting any specified value).
  • CanonicalNamespaceNames: Set unconditionally based on the value of another setting.
  • DummyLanguageCodes: A bunch of stuff based on other settings and LanguageCode methods.
  • CacheEpoch: AtEase::suppress/restoreWarnings, gmdate(), filemtime(), another setting
  • Various log-related settings: Adjusted based on other log settings.
  • PasswordPolicy: Adjusted based on legacy settings.
  • PHPSessionHandling: Unrecognized values overridden, sometimes recognized ones as well.
  • Server: Throws exception if left at default false
  • ServerName: Set based on other settings, not customizable.
  • VirtualRestConfig: Altered based on other settings.
  • SecureLogin: Validated based on other settings.

The large majority of these could be implemented with a simple self-contained callback. I suggest something like the following:

public const ResourceBasePath = [
    'default' => null,
    // Default value is that of the setting ScriptPath
    'dynamicDefault' => 'ScriptPath',
];

public const Script = [
    'default' => false,
    // Default value is concatenation of a setting with a constant prefix and/or suffix. Format negotiable.
    'dynamicDefault' => [ 'setting' => 'ScriptPath', 'suffix' => '/index.php' ],
];

public const ShowEXIF = [
    'default' => null,
    // Default value is returned by a callback method
    'dynamicDefault' => [ 'callback' => true ],
];

public static function callbackShowEXIF(): bool {
    return function_exists( 'exif_read_data' );
}

public const MetaNamespace = [
    'default' => false,
    // Callback takes another setting as an argument
    'dynamicDefault' => [ 'callbackArgs' => [ 'Sitename' ] ],
}

public static function callbackMetaNamespace( string $sitename ): string {
    return str_replace( ' ', '_', $sitename );
}

public const CookiePrefix = [
    'default' => false,
    'dynamicDefault' => ...,
    // This is run unconditionally on the value
    'initializer' => true,
];

public static function initializeCookiePrefix( string $oldValue ): string {
    return strtr( $oldValue, '=,; +."\'\\[', '__________' );
}

public const FileExtensions = [
    'default' => [ ... ],
    // Other settings can be passed as extra arguments
    'initializerArgs' => [ 'ProhibitedFileExtensions' ],
];

public static function initializeFileExtensions( array $oldValue, array $prohibitedFileExtensions ): array {
    return array_values( array_diff( $oldValue, $prohibitedFileExtensions ) );
}

This would nicely cover all or practically all of the cases listed. We could decide to leave out the ones that called into code from other classes as a matter of judgment, if we wanted. Initialization order is handled automatically by the explicit specification of which settings get passed to the callbacks. The code to run all this would iterate through the settings and initialize them. A list would be kept of settings that were already initialized. Initialization would look something like:

  1. If the setting is already initialized, return.
  2. If the setting's value is equal to its default value, and it has a dynamicDefault specified:
    1. For each setting this one's dynamicDefault depends on, if that setting is not fully initialized yet, initialize it.
    2. Apply any dynamicDefault.
  3. If the setting has an initializer specified:
    1. For each setting the initializer depends on, if that setting is not fully initialized yet, initialize it.
    2. Apply the initializer.
  4. Add this setting to the list of settings that have already been initialized.

Initializers would be allowed to throw exceptions, and should also have some way of passing back warnings or debug info.

This setup handles dependencies automatically and ensures that order of execution is irrelevant. The intermediate state representation should be simple -- just mutate whatever preexisting object or array contains the settings that we're working with. Or keep a separate list of the values of already-initialized settings if we don't want to mutate the existing object.

My description here is rather lengthy, but the result would make dynamic defaults and initialization self-contained and self-explanatory rather than hiding off in some other class elsewhere, and I don't think the code would be so complicated.

I like the idea of defining the dynamic defaults in the schema, as methods on the schema class. But I am worried that all the functions calls will be a performance problem (this will run on every request, always). I also worry that the logic with different ways of defining the dynamic default is hard to follow and may have surprising effects, especially with the automatic resolution of parameters.

I'd prefer to try somethign simpler, less declarative. Something like this:

class MainConfigSchema {
    public static class initialize( ConfigBuilder $config ) {
        if ( $config->get( MainConfigNames::ArticlePath ) === false ) {
          $this->initializeArticlePath( $config );
        }
    }

    public static initializeArticlePath( ConfigBuilder $config ) {
        if ( $config->get( MainConfigNames::UsePathInfo ) ) {
		  $config->set( MainConfigNames::ArticlePath, $config->get( MainConfigNames::Script ) . '/$1' );
	    } else {
		  $config->set( MainConfigNames::ArticlePath, $config->get( MainConfigNames::Script ) . '?title=$1' );
	    }
        }
    }
}

Note that ConfigBuilder currently doesn't have a get() method, but that could easily be added. Also, SettingsBuilder currently doesn't expose a ConfigBuilder. But perhaps it should.

Using ConfigBuilder removes the problem of having to check whether we already have a "new" value for a config var when we need it in a default.

We can still declare the initializer methods in the schema arrays if we want to, but basing initialize() on these declarations needs too much iterating over everything and recursing into dependencies.

What's the problem with a one-pass iteration to automatically collect all the settings with dynamic defaults? If it's really measured to be a performance problem, it could be done very easily with code generation to avoid repeating the work on each page load.

We could have dynamic defaults all just be defined as callbacks to keep it simpler.

What's the problem with a one-pass iteration to automatically collect all the settings with dynamic defaults?

The main reason to me is that the order of entries matters. I don't really want to get into declaring dependencies between config variables, and determining the order on the fly. I have learned to be wary of "magic" like that. We could simply say that the order of evaluation is the order in which they are given in MainConfigSchema, but MainConfigSchema is organized by thematic sections. So that doesn't work.

We could define the order as an array in a function or constant. But when we iterate over it in a method on MainConfigSchema, we'd have to access the constants programmatically, via reflection. That's a lot of overhead. Alternatively, we could operate on the ConfigSchema object in SettingsBuilder - but then the code can't live in MainConfigSchema.

If we want the order to be encoded in MainConfigSchema, then the easiest solution is having a static method that is just a long list of calles to initializers (perahaps conditional calls, to avoid some unnecessary calls).

If it's really measured to be a performance problem, it could be done very easily with code generation to avoid repeating the work on each page load.

For the performance aspect it would be good to have a benchmark...
By the way, I have a patch lined up which would give us the code generation for free: https://gerrit.wikimedia.org/r/c/mediawiki/core/+/765249. But this doesn't solve the issue of ordering.

As a baseline for both the benchmark and testing, I think the first step should be to pull the current code out into a separate file, without otherwise changing it. I think that should be the fiorst step, no matter which solution we end up using.

We could have dynamic defaults all just be defined as callbacks to keep it simpler.

Yes, that would work, if we can figure out the correct order. However, I'd want to be restrictive about where that callback can live. I would propose to make it a string, which is interpreted as the name of a static function on MainConfigSchema.

Why not just make it a fixed name based on the name of the setting? Then we wouldn't even have to array_filter(), we could just iterate through the methods with reflection.

Good point again about putting the code in a separate file first for testing, though.

Why not just make it a fixed name based on the name of the setting? Then we wouldn't even have to array_filter(), we could just iterate through the methods with reflection.

Reflection is slow. We can't do that on every request. Code generation might work, but I'd like to keep that restricted to generating arrays (data), not actual logic, if possible.

Change 779067 had a related patch set uploaded (by Simetrical; author: Simetrical):

[mediawiki/core@master] WIP: Test dynamic configuration logic

https://gerrit.wikimedia.org/r/779067

Change 779067 merged by jenkins-bot:

[mediawiki/core@master] Test dynamic configuration logic

https://gerrit.wikimedia.org/r/779067

Change 776991 merged by jenkins-bot:

[mediawiki/core@master] Refactor URL-parsing global functions to class

https://gerrit.wikimedia.org/r/776991

Change 785385 had a related patch set uploaded (by Simetrical; author: Simetrical):

[mediawiki/core@master] Finish testing dynamic config

https://gerrit.wikimedia.org/r/785385

Change 785385 merged by jenkins-bot:

[mediawiki/core@master] Finish testing dynamic config

https://gerrit.wikimedia.org/r/785385

Change 785387 had a related patch set uploaded (by Simetrical; author: Simetrical):

[mediawiki/core@master] POC: move dynamic defaults into MainConfigSchema

https://gerrit.wikimedia.org/r/785387

Change 776362 abandoned by Simetrical:

[mediawiki/core@master] Don't make dynamic config defaults globals-only

Reason:

In favor of https://gerrit.wikimedia.org/r/785387

https://gerrit.wikimedia.org/r/776362

Change 787797 had a related patch set uploaded (by Umherirrender; author: Umherirrender):

[mediawiki/core@master] config: Cleanup dynamic short-cut config $wgUseEnotif

https://gerrit.wikimedia.org/r/787797

Change 787932 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[mediawiki/core@master] Add benchmark for dynamic settings loading.

https://gerrit.wikimedia.org/r/787932

Change 787932 merged by jenkins-bot:

[mediawiki/core@master] Add benchmarks for applying dynamic config.

https://gerrit.wikimedia.org/r/787932

It seems some of the refactoring here for dynamic Setup has broken globals in tests.

This is now being worked around, but it's starting to get messy and impossible to reason about.

https://gerrit.wikimedia.org/r/c/mediawiki/core/+/802491

@Krinkle wrote:
+ $GLOBALS['wgRCEngines'] = [];

Afaik we do load at least the default settings in unit test context as well, and the default is an array. null is not an expected value type for this config.

@kostajh wrote:

vendor/bin/phpunit tests/phpunit/unit/includes/rcfeed/RCFeedTest.php

[…] something to do with our bootstrapping process for MW that is setting this global to null.

Note that [the non-integration variant] works fine.

Change 785387 merged by jenkins-bot:

[mediawiki/core@master] Move dynamic defaults into MainConfigSchema

https://gerrit.wikimedia.org/r/785387

I would note that the latest change was likely the cause of a PHP error. Stack trace:

Fatal error: Uncaught Exception: DateTimeZone::__construct(): Unknown or bad timezone () in /var/www/html/w/includes/MainConfigSchema.php:4899 Stack trace: #0 /var/www/html/w/includes/MainConfigSchema.php(4899): DateTimeZone->__construct('') #1 /var/www/html/w/includes/Settings/DynamicDefaultValues.php(95): MediaWiki\MainConfigSchema::getDefaultLocalTZoffset('') #2 /var/www/html/w/includes/Settings/DynamicDefaultValues.php(40): MediaWiki\Settings\DynamicDefaultValues->computeDefaultFor('LocalTZoffset', Object(MediaWiki\Settings\Config\GlobalConfigBuilder), Array) #3 /var/www/html/w/includes/Setup.php(250): MediaWiki\Settings\DynamicDefaultValues->applyDynamicDefaults(Object(MediaWiki\Settings\Config\GlobalConfigBuilder)) #4 /var/www/html/w/includes/WebStart.php(86): require_once('/var/www/html/w...') #5 /var/www/html/w/index.php(44): require('/var/www/html/w...') #6 {main} thrown in /var/www/html/w/includes/MainConfigSchema.php on line 4899

This was fixed by changing $wgLocaltimezone to null. By default this config var is set to the empty string when creating the wiki as it's added automatically to LocalSettings.php. It being the empty string was fine in previous versions. I'm not sure how many people would have set this value, but this will probably create a lot of PHP errors unless the automatically generated LocalSettings.php config var is fixed. One thing I would note is that I could not run update.php due to the issue.

Change 812431 had a related patch set uploaded (by Krinkle; author: Krinkle):

[mediawiki/core@master] filerepo: Move default 'backend' from Setup to to service class

https://gerrit.wikimedia.org/r/812431

Change 812454 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[mediawiki/core@master] Dynamic defaults: fix $wgLocaltimezone being empty

https://gerrit.wikimedia.org/r/812454

I would note that the latest change was likely the cause of a PHP error. Stack trace:

Thank you for reporting, I supoaded a patch and added you as a reviewer.

Change 812458 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[mediawiki/core@master] RELEASE-NOTES: dynamic defaults applied before extension registration.

https://gerrit.wikimedia.org/r/812458

Change 812454 merged by jenkins-bot:

[mediawiki/core@master] Dynamic defaults: fix $wgLocaltimezone being empty

https://gerrit.wikimedia.org/r/812454

Change 812458 merged by jenkins-bot:

[mediawiki/core@master] RELEASE-NOTES: dynamic defaults applied before extension registration.

https://gerrit.wikimedia.org/r/812458

Change 820575 had a related patch set uploaded (by Krinkle; author: Krinkle):

[mediawiki/core@master] Setup: Remove redundant SetupDynamicConfig testss for FileBackend

https://gerrit.wikimedia.org/r/820575

Change 787797 merged by jenkins-bot:

[mediawiki/core@master] Setup: Remove internal short-cut config $wgUseEnotif

https://gerrit.wikimedia.org/r/787797

Change 824872 had a related patch set uploaded (by Krinkle; author: Krinkle):

[mediawiki/core@master] watchlist: Re-centralise former wgUseEnotif condition

https://gerrit.wikimedia.org/r/824872

Change 878198 had a related patch set uploaded (by D3r1ck01; author: Derick Alangi):

[mediawiki/extensions/MobileFrontend@master] tests: Move away from using $GLOBALS for configs

https://gerrit.wikimedia.org/r/878198

Change 878198 merged by jenkins-bot:

[mediawiki/extensions/MobileFrontend@master] tests: Move away from using $GLOBALS for configs

https://gerrit.wikimedia.org/r/878198

Change 824872 merged by jenkins-bot:

[mediawiki/core@master] watchlist: Re-centralise former wgUseEnotif condition

https://gerrit.wikimedia.org/r/824872

Change 889244 had a related patch set uploaded (by Krinkle; author: Krinkle):

[mediawiki/core@master] objectcache: Remove $wgMainWANCache and $wgWANObjectCaches

https://gerrit.wikimedia.org/r/889244

Change 889244 merged by jenkins-bot:

[mediawiki/core@master] objectcache: Remove $wgMainWANCache and $wgWANObjectCaches

https://gerrit.wikimedia.org/r/889244