**//Review the Problem Statement Artifact://** https://docs.google.com/document/d/1H3YsxQ4-M9vu9DNSSXEZYtkdFjJ9h189ZFY-47b7eSU/edit?usp=sharing
Decide how (and if) we should change the mechanism used to configure individual MediaWiki installations.
Currently, this is done by loading a PHP file that sets global variables, which prevents us from using standard mechanisms for managing and deploying configuration, and makes managing wiki farms an ad-hoc hack.
Re-designing the configuration mechanism presents on opportunity to align with industry standard practices, namely loading configuration from plain data files (typically using the JSON or YAML format). This would simplify the maintenance, review, and deployment of configuration. In particular, it would allow us to make use of the ConfigMaps mechanism built into Kubernetes.
Alternatives to changing the configuration mechanism in MediaWiki:
- Do nothing: we will have to work around and against Kubernetes when managing and deploying configuration, and making changes to configuration will remain risky.
- Make the change in WMF specific configuration and deployment code, without touching MediaWiki core: we will be working against and around how MediaWiki does things internally, and non-production systems (like CI, local development, etc) would not benefit.
Note: This proposal is focused on changing how MediaWiki loads configuration. This will enable us to change how we manage configuration for WMF production servers. General consideration of how configuration management should work in WMF production in the future will inform and drive this proposal (most importantly: use simple data files instead of code), the details of that will be left to be determined by a separate proposal.
The configuration for WMF's wikis has become increasingly complex over the years: it is currently roughly thirty thousand lines of executable code with complex data flows and conditionals. It is basically a computer program in its own right. This makes it risky to make changes, because it is hard to foresee consequences and side effects. For instance, there is no easy way to see the effective configuration for a given wiki.
While this has been a long standing annoyance causing bugs and friction, the move to Kubernetes now makes fixing this a pressing need: if we want to be to fully benefit from switching to Kubernetes, we have to make use of Kubernetes' mechanism for managing and deploying configuration. This is only possible if our configuration is not executable program logic, but plain data structures in the form of JSON (or YAML) files.
This proposal to overhaul the way MediaWiki loads configuration is guided by the following needs, desires, and constraints:
- Allow configuration to be updated and deployed without having to re-build docker containers
- Allow configuration to be managed and deployed using off-the-shelf tooling (Kubernetes ConfigMaps)
- Allow the effective configuration of a given wiki (for a given server group and data center) to be reviewed easily
- Allow for changes to the configuration be made easily and with confidence
- Provide a standard mechanism to quickly override parts of the configuration when reacting to incidents (e.g. disabling a broken database host).
- Provide proper support for wiki-farms (multi-tenant setups), not only for the benefit of WMF's production configuration but also for testing and development as well as 3rd party installations of MediaWiki.
- Make it easier to manage configuration for testing, both manual and in CI. This is a synergistic opportunity, laying the groundwork for a later project around automatically managing and loading configuration for different test scenarios.
Most of this is driven by the MediaWiki on Kubernetes initiative, which rolls up to the Resilience OKR. The points about development and test environments feed into Code Health but also Tech Community Building.
This proposal is brought to the Technical Decision Forum for the following reasons:
- to ensure the requirements and desires of the main stakeholders are correctly understood (RelEng and SRE)
- to gather information about constraints and concerns, especially with respect to performance and security
- to gather information about potential synergy with other future projects, such as end-to-end testing environments.
- to raise awareness about potential changes other teams may be affected by, such as the mechanism for accessing configuration for extensions.
- to ensure we do not have to go back: changing the way we do config affects each and every installation of MediaWiki. We should not make fundamental changes to this very often.