Page MenuHomePhabricator

Unit tests for wfLoadExtension @ operations/mediawiki-config
Closed, DeclinedPublic

Description

We're currently progressively updating CommonSettings.php to load extensions through wfLoadExtension instead of former include, to comply with extension registration and get rid of wmg/wg hack (see T119117).

Humans are quite bad to spot typos when extensions path or name look coherent, but this is potentially very harmful for the servers, as any deployment could break (and actually broke) the servers.

To avoid this, we can add tests to CommonSettings.php (and CommonSettings-labs.php): any wfLoadExtension method call should be validated, to ensure it requires to load an actual extension.

Related incident: https://wikitech.wikimedia.org/wiki/Incident_documentation/20160601-MediaWiki

Event Timeline

Note: from local repo point of view, we're quite limited as we don't know from the repository what's really deployed in prod, but we can already compare with extension-list (as long as still available).

Some other actions tests apart could be valuable: use systematically a first sync with mw1017 to test such changes for example would allow to avoid to impact production.

This follow-up task from an incident report has not been updated recently. If it is no longer valid, please add a comment explaining why. If it is still valid, please prioritize it appropriately relative to your other work. If you have any questions, feel free to ask me (Greg Grossmeier).

Having tests for this seems nice, but I don't think that's feasible within the current CI setup for wmf-config.

But looking at the actual incident and example in the task description, it is unclear to me how or why this would've caused production impact. Typos and such in wfLoadExtension() calls are deterministic would be trivial to detect as soon as you're on the mwdebug server. Going to any wiki at all that uses the extension while staging, will have uncovered that.

Catching in CI would be even nicer of course, but I think in this case our existing deployment process already covered this, but it was skipped. Can we confirm that the staging step was skipped? If it was not skipped, we need to figure out why the problem didn't happen during staging. If it was skipped, we may need to look into automating and enforcing that step as part of Scap. E.g. make it so that scap sync-* always goes to mwdebug first, and then leaves an interactive prompt for you to proceed once it has been verified (perhaps skippable via --force). Or, if we think it was an isolated case and that we generally follow it and don't want this kind of automation, then we should close the task :)