Page MenuHomePhabricator

TDF: Review mechanism to configure individual MediaWiki installations
Open, Needs TriagePublic

Description

**//Review the Problem Statement Artifact://** https://docs.google.com/document/d/1H3YsxQ4-M9vu9DNSSXEZYtkdFjJ9h189ZFY-47b7eSU/edit?usp=sharing


Decide how (and if) we should change the mechanism used to configure individual MediaWiki installations.

Problem Statement

Currently, this is done by loading a PHP file that sets global variables, which prevents us from using standard mechanisms for managing and deploying configuration, and makes managing wiki farms an ad-hoc hack.

Re-designing the configuration mechanism presents on opportunity to align with industry standard practices, namely loading configuration from plain data files (typically using the JSON or YAML format). This would simplify the maintenance, review, and deployment of configuration. In particular, it would allow us to make use of the ConfigMaps mechanism built into Kubernetes.

Alternatives to changing the configuration mechanism in MediaWiki:

  • Do nothing: we will have to work around and against Kubernetes when managing and deploying configuration, and making changes to configuration will remain risky.
  • Make the change in WMF specific configuration and deployment code, without touching MediaWiki core: we will be working against and around how MediaWiki does things internally, and non-production systems (like CI, local development, etc) would not benefit.

Note: This proposal is focused on changing how MediaWiki loads configuration. This will enable us to change how we manage configuration for WMF production servers. General consideration of how configuration management should work in WMF production in the future will inform and drive this proposal (most importantly: use simple data files instead of code), the details of that will be left to be determined by a separate proposal.

Motivation

The configuration for WMF's wikis has become increasingly complex over the years: it is currently roughly thirty thousand lines of executable code with complex data flows and conditionals. It is basically a computer program in its own right. This makes it risky to make changes, because it is hard to foresee consequences and side effects. For instance, there is no easy way to see the effective configuration for a given wiki.

While this has been a long standing annoyance causing bugs and friction, the move to Kubernetes now makes fixing this a pressing need: if we want to be to fully benefit from switching to Kubernetes, we have to make use of Kubernetes' mechanism for managing and deploying configuration. This is only possible if our configuration is not executable program logic, but plain data structures in the form of JSON (or YAML) files.

This proposal to overhaul the way MediaWiki loads configuration is guided by the following needs, desires, and constraints:

  1. Allow configuration to be updated and deployed without having to re-build docker containers
  2. Allow configuration to be managed and deployed using off-the-shelf tooling (Kubernetes ConfigMaps)
  3. Allow the effective configuration of a given wiki (for a given server group and data center) to be reviewed easily
  4. Allow for changes to the configuration be made easily and with confidence
  5. Provide a standard mechanism to quickly override parts of the configuration when reacting to incidents (e.g. disabling a broken database host).
  6. Provide proper support for wiki-farms (multi-tenant setups), not only for the benefit of WMF's production configuration but also for testing and development as well as 3rd party installations of MediaWiki.
  7. Make it easier to manage configuration for testing, both manual and in CI. This is a synergistic opportunity, laying the groundwork for a later project around automatically managing and loading configuration for different test scenarios.

Most of this is driven by the MediaWiki on Kubernetes initiative, which rolls up to the Resilience OKR. The points about development and test environments feed into Code Health but also Tech Community Building.

This proposal is brought to the Technical Decision Forum for the following reasons:

  • to ensure the requirements and desires of the main stakeholders are correctly understood (RelEng and SRE)
  • to gather information about constraints and concerns, especially with respect to performance and security
  • to gather information about potential synergy with other future projects, such as end-to-end testing environments.
  • to raise awareness about potential changes other teams may be affected by, such as the mechanism for accessing configuration for extensions.
  • to ensure we do not have to go back: changing the way we do config affects each and every installation of MediaWiki. We should not make fundamental changes to this very often.

Decision Record

See also:

Event Timeline

daniel renamed this task from TDF: Decide how (and if) we should change the mechanism used to configure individual MediaWiki installations. to TDF: Review mechanism to configure individual MediaWiki installations.Oct 4 2021, 3:12 PM
daniel updated the task description. (Show Details)

Re-designing the configuration mechanism presents on opportunity to align with industry standard practices, namely loading configuration from plain data files (typically using the JSON or YAML format

Is evaluation of third-party libraries (e.g. Symfony Config) that do some or all of this work within scope of this discussion? It would be nice to reuse and contribute to upstream projects that have already implemented the types of things we are looking for.

We should think about the management of secrets and credentials, also part of the configuration setup, and incorporate that explicitly into the problem statement, so that it does not wind up tacked on as an afterthought to whatever solution is proposed and accepted for implementation in a later phase of the process.

Can the Google Doc be made public? I can't even read it with my wikimedia.org account.

Can the Google Doc be made public? I can't even read it with my wikimedia.org account.

My understanding is that Jen will make it public when this topic starts the first phase of the TDF process. Which, I hope, is some times this week.

The contents of the doc are nearly identical to what's here though. It was created by copying text from this ticket.

Can the Google Doc be made public? I can't even read it with my wikimedia.org account.

There isn't any additional information in the google doc.

One aspect of this that's a bit Wikimedia-specific is the production PrivateSettings.php file, which obviously contains not only sensitive config data like secrets, but is also used in an ad-hoc way to globally inject code for incident response and similarly sensitive events. Wikimedia's usage of PrivateSettings.php is likely far from any kind of best practice, but it has proven extremely convenient (essential?) at times during security incident response.

One aspect of this that's a bit Wikimedia-specific is the production PrivateSettings.php file, which obviously contains not only sensitive config data like secrets, but is also used in an ad-hoc way to globally inject code for incident response and similarly sensitive events. Wikimedia's usage of PrivateSettings.php is likely far from any kind of best practice, but it has proven extremely convenient (essential?) at times during security incident response.

I was thinking this use case could be covered by the existing mechanism for security patches. The ad-hoc incidence response would me implemented as a security patch, and would be rolled out in the same way we roll out other code changes (whatever that mechanism is going to be).

Would that be acceptable?

In any case, the new configuration mechanism wouldn't prevent us from having a PrivateSettings.php file with ad-hoc code in hook handlers, as we do now. The question is how we can quickly deploy critical code changes in the future. I think that needs to be discussed separately from the question of managing configuration.

This proposal is focused on changing how MediaWiki loads configuration.

I don't think that's a thing currently. MediaWiki just expects the Config object to provide the configuration via a simple key-value-store-like interface. Almost everyone uses the GlobalVarConfig implementation, which uses the PHP global variable namespace as the key-value store. How those globals are set is something MediaWiki is almost entirely agnostic about.

("Almost", SiteConfiguration being the one exception I'm aware of. That's used in some cross-wiki features, where code running in the environment of a given wiki needs to pretend it is actually running for another wiki, e.g. to load a content from another wiki's database. See T255213: Create a mechanism for loading configuration for sister site and T184529: Define a way to get a database connection based on a logical wiki ID.. But it's very limited and fragile, largely undocumented, and I'm not sure how much is used outside Wikimedia.)

Formalizing / standardizing how configuration is loaded by MediaWiki could certainly be valuable for third-party wikis and local development. (T221535: Provide a "wiki farm" abstraction in MediaWiki core is a related task.) But, since it's almost entirely unspecified now, I don't think it's a requirement for changing how configuration is loaded in Wikimedia, ie.

Make the change in WMF specific configuration and deployment code, without touching MediaWiki core: we will be working against and around how MediaWiki does things internally

is, as far as I can see, only true for the rarely-used SiteConfiguration, and even there, I don't think it would be really working against it - it could be easily made more abstract so it can use YAML files instead of globals / dumping and re-parsing the configuration of another wiki by shelling out to a maintenance script.

If we wanted to make configuration loading a first-class concept in MediaWiki (which, again, I think is a valuable goal), I have doubts about the Technical Decision Forum being the right process for that. This is something with a heavy impact on third-party MediaWiki users, who are entirely excluded from TDF processes. Also, I think the TDF has failed to keep decision processes aligned with core open-source values such as transparency of the decisionmaking process (if that was even a goal), which is maybe OK for decisions which primarily affect the Wikimedia infrastructure, but I don't think its acceptable for decisions with a large impact on the entire MediaWiki user and developer community. Something similar to the old TechCom RfC process would be more appropriate there.

Provide proper support for wiki-farms (multi-tenant setups), not only for the benefit of WMF's production configuration but also for testing and development as well as 3rd party installations of MediaWiki.

Pure static configuration is not feasible for large wiki farms that use a dynamic configuration management system to store and modify wiki-local configuration, since such a system requires custom PHP code running in the "configuration" step (LocalSettings.php) to fetch and expose configuration for a given wiki. Being able to utilize such a system makes it possible to have an automatic wiki creation flow, reduces the amount of configuration that needs to be stored and deployed as code, and allows staff members to make trivial wiki-local configuration changes without requiring engineering involvement; losing it would be a significant complication.

In case it may be useful in the context of this proposal, I have some slides describing our config management system which I presented in the WMTC2019 unconference: T238273

GrowthExperiments has a system called community configuration, to make it possible for wiki communities to tailor Growth features to their needs. I think it is mostly orthogonal to this proposal but mentioning it just FYI. Basically, there are three Config objects: the default one, one which loads JSON wiki pages (with appropriate caching) and uses them as a key-value store for configuration settings, and a router which is registered as the main configuration and determines (based on a hardcoded list) whether to send configuration variable lookups to the conventional configuration, or to the on-wiki configuration, with the conventional configuration as a fallback. All the configuration settings that can be changed on-wiki belong to the GrowthExperiments extension, although it would be nice to turn it into something more generic eventually.

One aspect of this that's a bit Wikimedia-specific is the production PrivateSettings.php file, which obviously contains not only sensitive config data like secrets, but is also used in an ad-hoc way to globally inject code for incident response and similarly sensitive events. Wikimedia's usage of PrivateSettings.php is likely far from any kind of best practice, but it has proven extremely convenient (essential?) at times during security incident response.

I was thinking this use case could be covered by the existing mechanism for security patches. The ad-hoc incidence response would me implemented as a security patch, and would be rolled out in the same way we roll out other code changes (whatever that mechanism is going to be).

Would that be acceptable?

Are you suggesting adding more of the current security patch deployment process around changes made to PrivateSettings.php in Wikimedia production? If so, I guess we could do that, though I'm not entirely sure what advantages that provides, since these patches would only ever live within Wikimedia production and wouldn't make it into a security release or anything like that. I think the general point here is that it is incredibly convenient and practical to have a protected, global injection point like PrivateSettings.php when dealing with certain security incidents. Retaining that existing functionality while potentially migrating to a new configuration paradigm would be acceptable to the Security-Team.

Are you suggesting adding more of the current security patch deployment process around changes made to PrivateSettings.php in Wikimedia production? If so, I guess we could do that, though I'm not entirely sure what advantages that provides, since these patches would only ever live within Wikimedia production and wouldn't make it into a security release or anything like that. I think the general point here is that it is incredibly convenient and practical to have a protected, global injection point like PrivateSettings.php when dealing with certain security incidents. Retaining that existing functionality while potentially migrating to a new configuration paradigm would be acceptable to the Security-Team.

The question is how we to retain the current functionality in a future where we can no longer use scap to push config and code to production servers. My understanding is that with Kubernetes, we can push out config changes relatively quickly, but code changes require us to build and deploy a new image. I suppose that would also be true for deploying security patches.

All that being said: this is really beyond the scope of this proposal, it is not really something that we can address with a config loading system for core. But it's probably a good idea to start thinking about it now.