Page MenuHomePhabricator

Iteratively clean up wmf-config to be less dynamic and with smaller settings files (2022)
Open, MediumPublic

Description

This is a subtask of T223602 with a less ambitious and more iterative approach to get us in a better shape and have a better sense of what's in front of us. This should leave us in a state that is essentially a cleansed and easy to maintain and comprehend version of the status quo, which the parent task can then tackle further as only focussing on data format and location, with the rest of all the edge cases and problems already sorted out.

Below is result of @Ladsgroup and @Krinkle chatting at MediaWIki Hackathon 2022:

Minimal scope
  • Keep things fast to edit.
  • Keep things fast to deploy.
  • Keep using (for now) the same underlying system for storing and loading configuration from MW core and contributor perspective.
  • Keep the same CI job and diff/preview features and confidence we get from that.
Minimal problem statement
  1. There should only one "right" place for a given setting.

Use case: Discovery, productivity, on-boarding, debugging.

  1. Make config files more machine-editable.

Use case: Automate wmf-config patch creation when creating new wikis.

  1. Remove most dynamic code and turn most such statements into pure data arrays.

Use case: Increase benefit of our CI diffConfig job. Currently it can't diff side-effects within CommonSettings.php and the other various complex PHP files.

Use case: Let CI assert there are no accidental overwrites or conflicts between the various wmf-config files that assign the same variable.

Use case: Prepare the repo such in the parent task we can figure out how we want to format and version this data for kubernetes, whether by injecting at pod launch time, or in some fast way from an image rebuild etc.

Today
  • /wmf-config/CommonSettings.php, the entry point from MW core Setup/LocalSettings, it essentially assigns $wgConf = require IS.php.
  • /wmf-config/InitialiseSettings.php, is the almost-pure-data wgConfig structured array.
  • /wmf-config/*.php, are mostly configuration but currently as executable statements. Most of these could be simple arrays in IS.php, but we keep them separate for easier browsing and editing, plus there's some wLoadExtension() statements.
  • /wmf-config/missing.php, not configuration. This is a web entry point used by multiversion (it is also referenced by dead code in CS.php, predating multiversion).
  • /wmf-config/etcd.php, defines functions, not configuration.
  • /wmf-config/import.php, defines functions, not configuration.
  • /wmf-config/profiler.php, preload init for PHP, not configuration.
  • /wmf-config/interwiki.php, configuration data assigned in CS.php.
  • /wmf-config/ProductionServices.php, configuration for Wikimedia\MWConfig\ServiceConfig, which is read in CS.php as $wmgAllServices = ServiceConfig::getInstance()->getAllServices();
Proposed outcome
  • /wmf-config/CommonSettings.php, entry point assigning $wgConf = core.php + ext-FlaggedRevs.php + IS.php.
  • /wmf-config/<component>.php, pure data in StaticSiteConfiguration format, e.g. core.php, ext-FlaggedRevs.php, etc, split out from IS.php.
  • /multiversion/missing.php, new place for missing.php.
  • /src/etc.php, new place for etc.php.
  • /src/import.php, new place for import.php.
  • /src/profiler.php, new place for profiler.php.
Steps to get there
  • Minor change to IS.php from function to pure static array, and update require in CommonSettings.php.
  • Create the first /wmf-config/<component>.php file with a handful of settings.
  • In CS.php, read assign IS.php + <component>.php.
  • Write PHPUnit test to assert there are no overlapping keys between these files.
  • Slowly move non-config from current wmf-config/*.php files to wmf-config/CommonSettings.php (e.g. wfLoadExtension calls).
  • Slowly move config from current wmf-config/*.php files to wmf-config/<component>.php.
  • Slowly move simple config from CS.php to a relevant wmf-config/component>.php file.

Details

ProjectBranchLines +/-Subject
operations/mediawiki-configmaster+25 -27
operations/mediawiki-configmaster+0 -18
operations/mediawiki-configmaster+11 -4
operations/mediawiki-configmaster+9 -12
operations/mediawiki-configmaster+1 K -1 K
operations/mediawiki-configmaster+3 -3
operations/mediawiki-configmaster+21 -11
operations/mediawiki-configmaster+94 -101
operations/mediawiki-configmaster+262 -15
operations/mediawiki-configmaster+13 -14
operations/mediawiki-configmaster+21 -10
operations/mediawiki-configmaster+1 -12
operations/mediawiki-configmaster+1 -1
operations/mediawiki-configmaster+307 -295
operations/mediawiki-configmaster+3 -3
operations/mediawiki-configmaster+143 -243
operations/mediawiki-configmaster+29 -23
operations/mediawiki-configmaster+2 -0
Show related patches Customize query in gerrit

Event Timeline

Krinkle triaged this task as Medium priority.
Krinkle updated the task description. (Show Details)
Krinkle updated the task description. (Show Details)
Krinkle edited projects, added Wikimedia-Hackathon-2022; removed User-Daniel.

Change 793869 had a related patch set uploaded (by Krinkle; author: Amir Sarabadani):

[operations/mediawiki-config@master] Make IS.php return an array

https://gerrit.wikimedia.org/r/793869

Change 793871 had a related patch set uploaded (by Krinkle; author: Amir Sarabadani):

[operations/mediawiki-config@master] Update CommonSettings to use array return from IS.php

https://gerrit.wikimedia.org/r/793871

Change 793873 had a related patch set uploaded (by Krinkle; author: Amir Sarabadani):

[operations/mediawiki-config@master] Move ORES settings from InitialiseSettings.php to ext-ORES.php

https://gerrit.wikimedia.org/r/793873

Change 793869 merged by jenkins-bot:

[operations/mediawiki-config@master] Make IS.php return an array

https://gerrit.wikimedia.org/r/793869

Change 793871 merged by jenkins-bot:

[operations/mediawiki-config@master] Update CommonSettings to use array return from IS.php

https://gerrit.wikimedia.org/r/793871

Change 795713 had a related patch set uploaded (by Krinkle; author: Krinkle):

[operations/mediawiki-config@master] profiler: Replace deprecated xhgui-collector and mongofill with local class

https://gerrit.wikimedia.org/r/795713

Change 795713 merged by jenkins-bot:

[operations/mediawiki-config@master] profiler: Replace deprecated xhgui-collector and mongofill with local class

https://gerrit.wikimedia.org/r/795713

Change 795989 had a related patch set uploaded (by Krinkle; author: Krinkle):

[operations/mediawiki-config@master] profiler: Move from wmf-config/ to src/

https://gerrit.wikimedia.org/r/795989

Change 795989 merged by jenkins-bot:

[operations/mediawiki-config@master] profiler: Move from wmf-config/ to src/

https://gerrit.wikimedia.org/r/795989

Change 796049 had a related patch set uploaded (by Krinkle; author: Krinkle):

[operations/mediawiki-config@master] tests: Assert that wikiversions.json is complete as per all.dblist

https://gerrit.wikimedia.org/r/796049

Change 796050 had a related patch set uploaded (by Krinkle; author: Krinkle):

[operations/mediawiki-config@master] CommonSettings: Remove redundant array_search and missing.php ref

https://gerrit.wikimedia.org/r/796050

Change 796300 had a related patch set uploaded (by Krinkle; author: Krinkle):

[operations/mediawiki-config@master] profiler: Turn from functions into class

https://gerrit.wikimedia.org/r/796300

Change 799272 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[operations/mediawiki-config@master] Move CirrusSearch settings from IS.php to ext-CirrusSearch.php

https://gerrit.wikimedia.org/r/799272

I like this approach very much. Can you share thoughts about how we'll organize beta cluster configuration? Will this go into the "ext-<foo>.php" files, will there be a "-labs" variant, or will this stay in "InitialiseSettings-labs.php"?

I like this approach very much. Can you share thoughts about how we'll organize beta cluster configuration? Will this go into the "ext-<foo>.php" files, will there be a "-labs" variant, or will this stay in "InitialiseSettings-labs.php"?

I personally think it should stay as it is on the grounds that labs overrides must be kept to minimum to make sure production and beta stay similar (and avoid implicitly encouraging otherwise). That being said, I can be convinced to something else.

C-2

I think this split is a very bad idea. Our configuration is (mostly) atomic right now, and this would make it impossible to e.g. safely roll out a combined FR permissions+rights change, as either we'd be granting rights that don't exist or using rights that don't exist and triggering fatals. The long-time plan was to scrap all the per-area files and fold them into InitialiseSettings.json. I don't find "we keep them separate for easier browsing and editing" to be remotely compelling.

While your point is valid, I think it's about a different aspect than this ticket's targeting. Let me explain how. We have two different set of problems:

  • Problems on side of the machine: Atomic deployment, performance of loading configuration, etc. etc.
  • Problems of human side: Ease of finding configuration, painless git blame, syntax highlight working, etc. etc.

They are contradicting each other but it doesn't mean they have to block each other improvements. My ideal future is to have one yaml file per extension for humans to read or change and they output a massive php file (similar to interwiki cache I think). In order to get there, we first need to split them to php files and then turn them into yaml files that in build step/CI/maint script become IS.php back again (maybe under another name).

While your point is valid, I think it's about a different aspect than this ticket's targeting. Let me explain how. We have two different set of problems:

  • Problems on side of the machine: Atomic deployment, performance of loading configuration, etc. etc.
  • Problems of human side: Ease of finding configuration, painless git blame, syntax highlight working, etc. etc.

They are contradicting each other but it doesn't mean they have to block each other improvements. My ideal future is to have one yaml file per extension for humans to read or change and they output a massive php file (similar to interwiki cache I think). In order to get there, we first need to split them to php files and then turn them into yaml files that in build step/CI/maint script become IS.php back again (maybe under another name).

Well, the plan was to move from PHP files to YAML files per wiki with inheritance (compiled at merge time or whenever into the machine-readable files that are actually deployed).

This work takes us away from that plan, towards a code-centred view of the world that makes no sense to anyone who's not a super-expert.

While your point is valid, I think it's about a different aspect than this ticket's targeting. Let me explain how. We have two different set of problems:

  • Problems on side of the machine: Atomic deployment, performance of loading configuration, etc. etc.
  • Problems of human side: Ease of finding configuration, painless git blame, syntax highlight working, etc. etc.

They are contradicting each other but it doesn't mean they have to block each other improvements. My ideal future is to have one yaml file per extension for humans to read or change and they output a massive php file (similar to interwiki cache I think). In order to get there, we first need to split them to php files and then turn them into yaml files that in build step/CI/maint script become IS.php back again (maybe under another name).

Well, the plan was to move from PHP files to YAML files per wiki with inheritance (compiled at merge time or whenever into the machine-readable files that are actually deployed).

This work takes us away from that plan, towards a code-centred view of the world that makes no sense to anyone who's not a super-expert.

That's the problem if you ask me. Per wiki files will lead to having 1000 files, finding configuration would be actually much harder and it'll be quite messy (if you ask me). For example check the static/images/project-logos directory. I assume, for example people who change a configuration are usually maintainer of that extension (e.g. CirrusSearch configuration) and it'd be easier for someone in search platform team to have an overview of all of their configuration they are maintaining than spread out in 1000 files or in one massive file.

I would like to see configuration compiled in several stages:

  • Compact representation of customizable settings. Organizing per-feature seems nice because generally a single change should touch one feature, but it's very common for this change to apply to many wikis. I would rather edit in one place.
  • Tooling-readable representation of per-wiki settings. Ideally this goes beyond the config-cache work, to include CommonSettings.php and -labs variants, to give a complete and static list of all settings that will apply to each wiki. This gives a deterministic and repeatable format which can be used directly in containers, for example, and can be scanned to answer questions like "on which wikis are feature A and B both enabled?".
  • Wiki-readable representation of runtime settings. Apparently this should be code-generated to PHP for maximum load efficiency?

This work takes us away from that plan, towards a code-centred view of the world that makes no sense to anyone who's not a super-expert.

That's the problem if you ask me. Per wiki files will lead to having 1000 files, finding configuration would be actually much harder and it'll be quite messy (if you ask me). For example check the static/images/project-logos directory. I assume, for example people who change a configuration are usually maintainer of that extension (e.g. CirrusSearch configuration) and it'd be easier for someone in search platform team to have an overview of all of their configuration they are maintaining than spread out in 1000 files or in one massive file.

There are ~100 people maintaining code extensions and ~100k people in communities asking for changes. Our focus should be on making life for the 100,000 better, not the 100.

Having a system in noc.wikimedia.org for users to be able to see configuration of their wiki is actually quite easy (I'll make a POC now). Users want to read configuration of their wiki and won't change the configurations that much themselves (I can go through git history of IS.php) and still most of the time you know what extension you want to change (e.g. Abusefilter).

If you disagree, I can try to analyze IS.php's configs and its history and see to what degree who changed what.

Having a system in noc.wikimedia.org for users to be able to see configuration of their wiki is actually quite easy (I'll make a POC now).

https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/core/+/418cb8d9b313a3a8d760bf15d87f0d2557e19cd3 is the start of the work to do this on-wiki.

Change 799352 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[operations/mediawiki-config@master] [POC] noc: Add perwiki.php to show per wiki configuration

https://gerrit.wikimedia.org/r/799352

This is the POC ^. Obviously it has the downside of not showing configurations that don't have override in production (and obviously the one inside mw is much better) but it works for now.

Change 801831 had a related patch set uploaded (by Krinkle; author: Krinkle):

[operations/mediawiki-config@master] Profiler: Update wmfSetupProfiler() call

https://gerrit.wikimedia.org/r/801831

Change 801832 had a related patch set uploaded (by Krinkle; author: Krinkle):

[operations/mediawiki-config@master] Profiler: Remove temporary back-compat for wmfSetupProfiler()

https://gerrit.wikimedia.org/r/801832

Change 796300 merged by jenkins-bot:

[operations/mediawiki-config@master] profiler: Turn from functions into class

https://gerrit.wikimedia.org/r/796300

Change 801831 merged by jenkins-bot:

[operations/mediawiki-config@master] Profiler: Update wmfSetupProfiler() call

https://gerrit.wikimedia.org/r/801831

Change 801832 merged by jenkins-bot:

[operations/mediawiki-config@master] Profiler: Remove temporary back-compat for wmfSetupProfiler()

https://gerrit.wikimedia.org/r/801832

Change 796049 merged by jenkins-bot:

[operations/mediawiki-config@master] tests: Assert that wikiversions.json is complete as per all.dblist

https://gerrit.wikimedia.org/r/796049

Change 796050 merged by jenkins-bot:

[operations/mediawiki-config@master] CommonSettings: Remove redundant array_search and missing.php ref

https://gerrit.wikimedia.org/r/796050

Change 807609 had a related patch set uploaded (by Krinkle; author: Krinkle):

[operations/mediawiki-config@master] missing.php: Update docs and add test plan

https://gerrit.wikimedia.org/r/807609

Change 807610 had a related patch set uploaded (by Krinkle; author: Krinkle):

[operations/mediawiki-config@master] multiversion: Move missing.php from wmf-config/ to /multiversion

https://gerrit.wikimedia.org/r/807610

Change 810147 had a related patch set uploaded (by Krinkle; author: Krinkle):

[operations/mediawiki-config@master] multiversion: Factor out getTagsForWiki() for re-use

https://gerrit.wikimedia.org/r/810147

Change 799352 merged by jenkins-bot:

[operations/mediawiki-config@master] noc: Add wiki.php to view a given wiki configuration

https://gerrit.wikimedia.org/r/799352

I think the fact that now we have a wiki-view of configuration should go to the tech news.

Re: Tech News - What wording would you suggest as the content? I.e. What are we hoping local editors are happy to learn from within this information resource, and what kinds of actions are we expecting/encouraging them to take with it?

E.g. We could say something like

It is now possible to easily view [all? most? some? of] the configuration settings that apply to just one wiki, and to compare settings between two wikis if those settings are different. For example: All Esperanto Wikipedia settings, or settings that are different between the Spanish and Esperanto Wikipedias. Local communities may want to discuss and propose changes to their local settings. Details about each of the named settings can be found by searching MediaWiki.org.

Corrections/clarifications/overhauls welcome!

Re: Tech News - What wording would you suggest as the content? I.e. What are we hoping local editors are happy to learn from within this information resource, and what kinds of actions are we expecting/encouraging them to take with it?

E.g. We could say something like

It is now possible to easily view [all? most? some? of] the configuration settings that apply to just one wiki, and to compare settings between two wikis if those settings are different. For example: All Esperanto Wikipedia settings, or settings that are different between the Spanish and Esperanto Wikipedias. Local communities may want to discuss and propose changes to their local settings. Details about each of the named settings can be found by searching MediaWiki.org.

Corrections/clarifications/overhauls welcome!

Most of configurations are visible. Private configuration (passwords, etc.) are not, some others as well are not there due to our tech debt but we will improve that.

Ok, added like so https://meta.wikimedia.org/wiki/Tech/News/2022/28 - Edits welcome within the next ~22 hours. Thanks!

Change 810147 merged by jenkins-bot:

[operations/mediawiki-config@master] multiversion: Factor out getTagsForWiki() for re-use

https://gerrit.wikimedia.org/r/810147

Change 807609 merged by jenkins-bot:

[operations/mediawiki-config@master] missing.php: Update docs and add test plan

https://gerrit.wikimedia.org/r/807609

Change 807610 merged by jenkins-bot:

[operations/mediawiki-config@master] multiversion: Move missing.php from wmf-config/ to /multiversion

https://gerrit.wikimedia.org/r/807610

Change 818648 had a related patch set uploaded (by Krinkle; author: Krinkle):

[operations/mediawiki-config@master] multiversion: Untangle MWConfigCacheGenerator from CS.php (1/2)

https://gerrit.wikimedia.org/r/818648

Change 818649 had a related patch set uploaded (by Krinkle; author: Krinkle):

[operations/mediawiki-config@master] multiversion: Untangle MWConfigCacheGenerator from CS.php (2/2)

https://gerrit.wikimedia.org/r/818649

Change 818651 had a related patch set uploaded (by Krinkle; author: Krinkle):

[operations/mediawiki-config@master] multiversion: Move labs-overrides responsibility to getStaticConfig()

https://gerrit.wikimedia.org/r/818651

Change 819203 had a related patch set uploaded (by Krinkle; author: Krinkle):

[operations/mediawiki-config@master] CirrusTest: Remove reference to 'unittest' realm

https://gerrit.wikimedia.org/r/819203

Change 819203 merged by jenkins-bot:

[operations/mediawiki-config@master] CirrusTest: Remove reference to 'unittest' realm

https://gerrit.wikimedia.org/r/819203

Change 818648 merged by jenkins-bot:

[operations/mediawiki-config@master] multiversion: Untangle MWConfigCacheGenerator from CS.php (1/2)

https://gerrit.wikimedia.org/r/818648

Change 818649 merged by jenkins-bot:

[operations/mediawiki-config@master] multiversion: Untangle MWConfigCacheGenerator from CS.php (2/2)

https://gerrit.wikimedia.org/r/818649

Change 818651 merged by jenkins-bot:

[operations/mediawiki-config@master] multiversion: Move labs-overrides responsibility to getStaticConfig()

https://gerrit.wikimedia.org/r/818651