Page MenuHomePhabricator

Define variant Wikimedia production config in compiled, static files
Open, Stalled, MediumPublic

Description

Arising from James's previous musing, and discussions at the 2019 Hackathon.

What

  • InitialiseSettings.php (and much of CommonSettings.php) is replaced with per-wiki inheritable YAML files (to allow comments).
  • Actually-variant config goes into a much slimmer CommonSettings.php (or re-worked to not vary).
  • On merge, the YAML files are converted into one JSON file per wiki, for the currently-deployed version(s) of MW, which are stored in git.
  • This replaces the opportunistic cache in /tmp that we current have.

Inheritance tree:

allwikis.yaml
| Default values for all wikis (e.g. wgNamespacesWithSubpages which is over-ridden, or wgEnableCanonicalServerLink which isn't)
|
+- wikipedias.yaml
   | Standard values for Wikipedias, where they differ from defaults (e.g. wgSitename or the fallback logo) and special inheritances
   |
   +- dewiki.yaml 
        Bespoke values for the German Wikipedia (e.g. the logo, or FlaggedRevisions configuration) and other special inheritances

Comparison:

TaskCurrent situationFuture state
Config authored inInitialiseSettings.phpwikipedias.yaml etc.
Config build stepRuntime cache, in /tmp/Build time static file, in /srv/mediawiki/
mw-config mergeTrivial rebaseFull production build of on JSON static file per wiki
Config read stepFrom cache or computed liveAlways read from built static file

Pros

  • Variant configuration will be static, making it more plausible to inject into docker images.
  • It will be much clearer exactly which wikis' config is changing, so deployers have more confidence.
  • YAML configuration files explicitly set the inheritance pattern.
  • Easier to compare one wiki's config with another's (e.g. "how different is dewiki from frwiki?").
  • Clear when the rump of CommonSettings refers to undefined variables; variant config forced to be merged first.

Cons

  • Merging is harder (and slower?).
  • Harder to audit all wikis' config for settings that "shouldn't" be over-ridden, or see how values vary.
  • Production branch pruning, currently just a disc operation and a sync, now needs a commit to mw-config as well as a deploy to delete.
  • First time we're reading YAML files in PHP prod. We're not reading them in prod, only in CI.

Former questions

  • Deterministic sort of output files to avoid noise.
    • Assuming that alphasort of the array by keys (ksort) sufficient.
  • How do we do splicing in private settings at run time?
    • Private settings are already spliced in in CommonSettings; no change.
  • Syntax for specifying config, and that a document inherits from another.
    • Roughly worked out; to be documented.
  • Syntax for specifying that descendent config can't over-ride (e.g. wgMiserMode)
    • For now, this is just a simple all.yaml file that is re-applied at the end and so can't be over-ridden.
  • Do we need to vary on the PHP run-time still? (once HHVM is un-deployed can this go away, or are there reasons beyond PHP serialisation format that we think this might vary?)
    • No. Nothing has been variant between HHVM and Zend for a while. No reason to continue.

Open questions

  • How does the CI work for this?
  • Do we need to check on build time that a vanilla MediaWiki install (i.e., DefaultSettings) doesn't set any config that isn't represented in all.yaml?
  • What do we do about variant non-static config?

Planned steps

Details

ProjectBranchLines +/-Subject
operations/mediawiki-configmaster+24 -6
operations/mediawiki-configmaster+0 -2
operations/mediawiki-configmaster+4 -8
integration/configmaster+6 -0
integration/configmaster+1 -0
operations/mediawiki-configmaster+123 -71
operations/mediawiki-configmaster+3 -22
operations/mediawiki-configmaster+406 -1 K
operations/mediawiki-configmaster+9 K -64
operations/puppetproduction+20 -2
operations/mediawiki-configmaster+28 -28
operations/mediawiki-configmaster+0 -1
operations/mediawiki-configmaster+3 -46
operations/mediawiki-configmaster+3 -9
operations/mediawiki-configmaster+14 -14
operations/mediawiki-configmaster+39 -3
operations/mediawiki-configmaster+12 -12
operations/mediawiki-configmaster+4 -7
operations/mediawiki-configmaster+40 -1
operations/mediawiki-configmaster+92 -77
Show related patches Customize query in gerrit

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 535963 merged by jenkins-bot:
[operations/mediawiki-config@master] Variant configuration: Read JSON config for all wikis

https://gerrit.wikimedia.org/r/535963

Mentioned in SAL (#wikimedia-operations) [2019-09-16T18:52:56Z] <jforrester@deploy1001> Synchronized wmf-config/CommonSettings.php: T223602 Variant configuration: Read JSON config for all wikis (duration: 00m 56s)

Anomie added a subscriber: Anomie.Sep 16 2019, 7:25 PM
  • InitialiseSettings.php (and much of CommonSettings.php) is replaced with per-wiki inheritable YAML files (to allow comments).
  • On merge, the YAML files are converted into one JSON file per wiki, for the currently-deployed version(s) of MW, which are stored in git.

I note T212460: Adopt static array files for local disk storage of values (epic) recommends static PHP-array files rather than YAML or JSON.

Inheritance tree:

allwikis.yaml
| Default values for all wikis (e.g. wgNamespacesWithSubpages which is over-ridden, or wgEnableCanonicalServerLink which isn't)
|
+- wikipedias.yaml
   | Standard values for Wikipedias, where they differ from defaults (e.g. wgSitename or the fallback logo) and special inheritances
   |
   +- dewiki.yaml 
        Bespoke values for the German Wikipedia (e.g. the logo, or FlaggedRevisions configuration) and other special inheritances

How would this proposed scheme handle something like this from the existing configuration?

'wmgBotPasswordsDatabase' => [
    'default' => 'metawiki',
    'private' => false,
    'fishbowl' => false,
    'nonglobal' => false,
],

Would we have to set false individually in advisorswiki.yaml, amwikimedia.yaml, arbcom_cswiki.yaml, arbcom_dewiki.yaml, arbcom_enwiki.yaml, arbcom_fiwiki.yaml, arbcom_nlwiki.yaml, auditcomwiki.yaml, boardgovcomwiki.yaml, boardwiki.yaml, chairwiki.yaml, chapcomwiki.yaml, checkuserwiki.yaml, cnwikimedia.yaml, collabwiki.yaml, donatewiki.yaml, ecwikimedia.yaml, electcomwiki.yaml, execwiki.yaml, fdcwiki.yaml, fixcopyrightwiki.yaml, foundationwiki.yaml, grantswiki.yaml, hiwikimedia.yaml, id_internalwikimedia.yaml, idwikimedia.yaml, iegcomwiki.yaml, ilwikimedia.yaml, internalwiki.yaml, labswiki.yaml, labtestwiki.yaml, legalteamwiki.yaml, maiwikimedia.yaml, movementroleswiki.yaml, noboard_chapterswikimedia.yaml, nostalgiawiki.yaml, officewiki.yaml, ombudsmenwiki.yaml, otrs_wikiwiki.yaml, projectcomwiki.yaml, punjabiwikimedia.yaml, romdwikimedia.yaml, rswikimedia.yaml, searchcomwiki.yaml, spcomwiki.yaml, stewardwiki.yaml, techconductwiki.yaml, transitionteamwiki.yaml, votewiki.yaml, wbwikimedia.yaml, wg_enwiki.yaml, and wikimaniateamwiki.yaml?

Something like 0f953f257 would be even worse.

  • It will be much clearer exactly which wikis' config is changing, so deployers have more confidence.

Is this currently a problem? I can't say I've ever encountered it.

  • Easier to compare one wiki's config with another's (e.g. "how different is dewiki from frwiki?").

Is this something we often want to do? I can't say I've ever wanted to do that.

What I usually want to see is "which wikis have which values for this specific setting?" Currently I can just look at the array for wmgBotPasswordsDatabase and see "everything has it set to 'mediawiki', except private, fishbowl, and nonglobal wikis". Now I'd have to grep every file and collate them somehow. I'd put that change into "Cons".

  • Need to check on build time that a vanilla MediaWiki install (i.e., DefaultSettings) doesn't set any config that isn't represented in allwikis.yaml.

So we're going to start setting every single setting now, nothing at all using the value from DefaultSettings.php or extension.json? Shouldn't that be explicitly mentioned rather than implied? Are we sure that's something we want to do, that every train might have to update the configurations just to copy defaults for newly-added settings.

  • Do we need to vary on the PHP run-time still? (once HHVM is undeployed can this go away, or are there reasons beyond PHP serialisation format that we think this might vary?)

Wouldn't surprise me if we run into it when we migrate from php 7.2 to php 8.0 (or whatever) someday.

  • InitialiseSettings.php (and much of CommonSettings.php) is replaced with per-wiki inheritable YAML files (to allow comments).
  • On merge, the YAML files are converted into one JSON file per wiki, for the currently-deployed version(s) of MW, which are stored in git.

I note T212460: Adopt static array files for local disk storage of values (epic) recommends static PHP-array files rather than YAML or JSON.

Indeed, but I don't feel the extra cost in terms of usability/maintainability

Inheritance tree:

allwikis.yaml
| Default values for all wikis (e.g. wgNamespacesWithSubpages which is over-ridden, or wgEnableCanonicalServerLink which isn't)
|
+- wikipedias.yaml
   | Standard values for Wikipedias, where they differ from defaults (e.g. wgSitename or the fallback logo) and special inheritances
   |
   +- dewiki.yaml 
        Bespoke values for the German Wikipedia (e.g. the logo, or FlaggedRevisions configuration) and other special inheritances

How would this proposed scheme handle something like this from the existing configuration?

'wmgBotPasswordsDatabase' => [
    'default' => 'metawiki',
    'private' => false,
    'fishbowl' => false,
    'nonglobal' => false,
],

Would we have to set false individually in advisorswiki.yaml, amwikimedia.yaml, arbcom_cswiki.yaml, arbcom_dewiki.yaml, arbcom_enwiki.yaml, arbcom_fiwiki.yaml, arbcom_nlwiki.yaml, auditcomwiki.yaml, boardgovcomwiki.yaml, boardwiki.yaml, chairwiki.yaml, chapcomwiki.yaml, checkuserwiki.yaml, cnwikimedia.yaml, collabwiki.yaml, donatewiki.yaml, ecwikimedia.yaml, electcomwiki.yaml, execwiki.yaml, fdcwiki.yaml, fixcopyrightwiki.yaml, foundationwiki.yaml, grantswiki.yaml, hiwikimedia.yaml, id_internalwikimedia.yaml, idwikimedia.yaml, iegcomwiki.yaml, ilwikimedia.yaml, internalwiki.yaml, labswiki.yaml, labtestwiki.yaml, legalteamwiki.yaml, maiwikimedia.yaml, movementroleswiki.yaml, noboard_chapterswikimedia.yaml, nostalgiawiki.yaml, officewiki.yaml, ombudsmenwiki.yaml, otrs_wikiwiki.yaml, projectcomwiki.yaml, punjabiwikimedia.yaml, romdwikimedia.yaml, rswikimedia.yaml, searchcomwiki.yaml, spcomwiki.yaml, stewardwiki.yaml, techconductwiki.yaml, transitionteamwiki.yaml, votewiki.yaml, wbwikimedia.yaml, wg_enwiki.yaml, and wikimaniateamwiki.yaml?

Obviously not. It'd be set in private.yaml etc..

Something like 0f953f257 would be even worse.

  • It will be much clearer exactly which wikis' config is changing, so deployers have more confidence.

Is this currently a problem? I can't say I've ever encountered it.

Yes.

  • Easier to compare one wiki's config with another's (e.g. "how different is dewiki from frwiki?").

Is this something we often want to do?

Yes. Converging wikis' configuration is a long-term goal.

I can't say I've ever wanted to do that.

What I usually want to see is "which wikis have which values for this specific setting?" Currently I can just look at the array for wmgBotPasswordsDatabase and see "everything has it set to 'mediawiki', except private, fishbowl, and nonglobal wikis". Now I'd have to grep every file and collate them somehow. I'd put that change into "Cons".

Sure, will tweak existing wording.

  • Need to check on build time that a vanilla MediaWiki install (i.e., DefaultSettings) doesn't set any config that isn't represented in allwikis.yaml.

So we're going to start setting every single setting now, nothing at all using the value from DefaultSettings.php or extension.json? Shouldn't that be explicitly mentioned rather than implied? Are we sure that's something we want to do, that every train might have to update the configurations just to copy defaults for newly-added settings.

This is preparatory work for the post-train world. Surprise is the enemy of automated deployment.

  • Do we need to vary on the PHP run-time still? (once HHVM is undeployed can this go away, or are there reasons beyond PHP serialisation format that we think this might vary?)

Wouldn't surprise me if we run into it when we migrate from php 7.2 to php 8.0 (or whatever) someday.

Yeah, we can certainly stash the runtime into the header like we currently do with the mtime and re-gen based on that, too.

Change 533594 merged by jenkins-bot:
[operations/mediawiki-config@master] Variant configuration: Never write to serialised PHP, drop support

https://gerrit.wikimedia.org/r/533594

Mentioned in SAL (#wikimedia-operations) [2019-09-18T20:15:38Z] <jforrester@deploy1001> Synchronized wmf-config/CommonSettings.php: Variant configuration: Never write to serialised PHP T223602 (duration: 01m 04s)

Jdforrester-WMF updated the task description. (Show Details)
Krinkle added a comment.EditedSep 19 2019, 8:12 PM

Change 507729 had a related patch set uploaded (by Jforrester; owner: Jforrester):
[operations/mediawiki-config@master] [WIP] Variant configuration: Pre-calculate config for each wiki and store it in config.git

https://gerrit.wikimedia.org/r/507729

I'm not sure this is worth optimising. It would mean having to run this pre-commit, keeping all versions of this in Git history indefinitely in their expanded form, and the rebase inconvenience as a result. If the overhead of reading InitialiseSettings (or it's N JSON file equivalent in the future) is too slow to do on-demand for the occasional cache miss after a deployment, then perhaps we can let Scap pre-populate the directory as-needed?

From a quick napkin calculation though, I think we'd be fine doing that on-the-fly as before. We've confirmed that now with the use of JSON for the tmp file that it's not as expensive as I thought. Worth a try?

Basically YAML as proposed, possibly with a JSON conversion as well still, but without expansion. Also, disallowing use of dynamic merges and substituting defaults for anything that has at least 1 variant, makes sense to me, and I do think it's worth pursuing that still. That would also keep this logic much simpler and means we won't need the duplication of SiteConfiguration logic the long term, nor during the transition.

Change 507729 had a related patch set uploaded (by Jforrester; owner: Jforrester):
[operations/mediawiki-config@master] [WIP] Variant configuration: Pre-calculate config for each wiki and store it in config.git

https://gerrit.wikimedia.org/r/507729

I'm not sure this is worth optimising. It would mean having to run this pre-commit, keeping all versions of this in Git history indefinitely in their expanded form, and the rebase inconvenience as a result. If the overhead of reading InitialiseSettings (or it's N JSON file equivalent in the future) is too slow to do on-demand for the occasional cache miss after a deployment, then perhaps we can let Scap pre-populate the directory as-needed?

We could, but the advantage of having the files in-repo is that diffs are visible to users as to the impact of their changes (see above).

From a quick napkin calculation though, I think we'd be fine doing that on-the-fly as before. We've confirmed that now with the use of JSON for the tmp file that it's not as expensive as I thought. Worth a try?

For the current use-case I imagine it'd be fine to do it live, but I don't think it would be for the next step of actual compilation ( https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/538129/ ); the complexity and cost of dblist calculations is one of the things we're saving by "compiling" at merge time rather than ad hoc live in production when a request hits a wiki.

Basically YAML as proposed, possibly with a JSON conversion as well still, but without expansion. Also, disallowing use of dynamic merges and substituting defaults for anything that has at least 1 variant, makes sense to me, and I do think it's worth pursuing that still. That would also keep this logic much simpler and means we won't need the duplication of SiteConfiguration logic the long term, nor during the transition.

It's not duplication if we delete SiteConfiguration from MediaWiki.

Change 537220 had a related patch set uploaded (by Krinkle; owner: Jforrester):
[operations/mediawiki-config@master] Move VariantSettings back to InitialiseSettings now that the migration is done

https://gerrit.wikimedia.org/r/537220

Change 537220 merged by jenkins-bot:
[operations/mediawiki-config@master] Move VariantSettings back to InitialiseSettings now that the migration is done

https://gerrit.wikimedia.org/r/537220

Change 539007 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[operations/mediawiki-config@master] noc: Refresh conf symlinks following 3373247e123b538

https://gerrit.wikimedia.org/r/539007

Change 539007 merged by jenkins-bot:
[operations/mediawiki-config@master] noc: Refresh conf symlinks following 3373247e123b538

https://gerrit.wikimedia.org/r/539007

Review of overall approach on using YAML perf restrictions/flexibilities (e.g. what should def be cached, and what would be fine to do at run-time) now pencilled in for Q3, maybe Q2. Not expecting to involve TechCom or CPT right now, but depending on how ambitious we want to be, might make sense to involve one or both at some point, but hoping right now to keep it isolated enough to not be cross-cutting

jbond added a subscriber: jbond.Oct 22 2019, 9:04 AM

Change 538129 had a related patch set uploaded (by Jforrester; owner: Jforrester):
[operations/mediawiki-config@master] Variant configuration: Allow for YAML-based inheritance of configuration

https://gerrit.wikimedia.org/r/538129

Change 545411 had a related patch set uploaded (by Jforrester; owner: Jforrester):
[operations/mediawiki-config@master] Variant configuration: Generate dblists from YAML

https://gerrit.wikimedia.org/r/545411

Hey @Jdforrester-WMF I'm looking around at the last patchset and not really understanding where the db lists will be and how the yaml will be used to generate the *dblist files. When the dust has all settled, where would a script go to look for the list of, say, closed dbs? (Assuming not a MediaWiki script and not even in php.) I ask because I'll need to update the dump scripts and other related tools if this is changing. Thanks!

Change 547283 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] check_private_data: ignore comments on private.dblist

https://gerrit.wikimedia.org/r/547283

Change 547283 merged by Jcrespo:
[operations/puppet@production] check_private_data: ignore comments on private.dblist

https://gerrit.wikimedia.org/r/547283

Change 538129 merged by jenkins-bot:
[operations/mediawiki-config@master] Variant configuration: Allow for YAML-based inheritance of configuration

https://gerrit.wikimedia.org/r/538129

Change 545411 merged by jenkins-bot:
[operations/mediawiki-config@master] Variant configuration: Generate dblists from YAML

https://gerrit.wikimedia.org/r/545411

Mentioned in SAL (#wikimedia-operations) [2019-11-26T20:33:13Z] <jforrester@deploy1001> Synchronized dblists/: Update dblists, now autogenerated (no-op, just comment changes) T223602 (duration: 01m 01s)

Change 553220 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[operations/mediawiki-config@master] tests: Remove obsolete logic for "-computed" dblists

https://gerrit.wikimedia.org/r/553220

Change 553220 merged by jenkins-bot:
[operations/mediawiki-config@master] tests: Remove obsolete logic for "-computed" dblists

https://gerrit.wikimedia.org/r/553220

Change 554941 had a related patch set uploaded (by Jforrester; owner: Jforrester):
[integration/config@master] jjb: Provide operations-mw-config-php72-composer-diffConfig-docker

https://gerrit.wikimedia.org/r/554941

Change 507729 merged by jenkins-bot:
[operations/mediawiki-config@master] Variant configuration: Pre-calculate config for each wiki on demand

https://gerrit.wikimedia.org/r/507729

Change 554951 had a related patch set uploaded (by Jforrester; owner: Jforrester):
[integration/config@master] layout: [mediawiki-operations] Provide a non-voting config diff job

https://gerrit.wikimedia.org/r/554951

Change 554941 merged by jenkins-bot:
[integration/config@master] jjb: Provide operations-mw-config-php72-composer-diffConfig-docker

https://gerrit.wikimedia.org/r/554941

Change 554951 merged by jenkins-bot:
[integration/config@master] layout: [mediawiki-operations] Provide a non-voting config diff job

https://gerrit.wikimedia.org/r/554951

Jdforrester-WMF changed the task status from Open to Stalled.Jan 10 2020, 6:34 PM

Stalled for the next month or two.

Change 576060 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[operations/mediawiki-config@master] MWConfigCacheGenerator: Remove unused 'docRoot' wgConf placeholder variable

https://gerrit.wikimedia.org/r/576060

Change 576060 merged by jenkins-bot:
[operations/mediawiki-config@master] MWConfigCacheGenerator: Remove unused 'docRoot' wgConf placeholder variable

https://gerrit.wikimedia.org/r/576060

Change 577037 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[operations/mediawiki-config@master] multiversion: Make buildDBLists.php both create and delete dblist files

https://gerrit.wikimedia.org/r/577037

Change 577040 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[operations/mediawiki-config@master] dblists: Remove 'labtestwiki' dblist containing 'labtestwiki'

https://gerrit.wikimedia.org/r/577040

Change 577040 merged by jenkins-bot:
[operations/mediawiki-config@master] dblists: Remove 'labtestwiki' dblist containing 'labtestwiki'

https://gerrit.wikimedia.org/r/577040

Change 577037 merged by jenkins-bot:
[operations/mediawiki-config@master] multiversion: Make buildDBLists.php both create and delete dblist files

https://gerrit.wikimedia.org/r/577037

jeena added a subscriber: jeena.Jul 6 2020, 11:45 PM
jijiki moved this task from Incoming 🐫 to Unsorted on the serviceops board.Aug 17 2020, 11:48 PM
Aklapper removed a subscriber: Anomie.Fri, Oct 16, 5:02 PM