== Motivation
Currently configuration updates are applied immediately. This makes it impossible to do any sanity checks before deploying, e.g. using the canary feature we have for code deployments. Canary has worked perfectly to avoid downtime due to bad code deployments, but at 2020-07-29 we had 6 minute downtime due to mistake in a configuration update.
== Proposal
Use normal deployment process also for configuration updates.
=== Process comparison
Currently, to do a configuration update:
# Submit a patch to translatewiki repository
# Have the patch merged
# Log in to web2
# Run `twn-update-config` (change is now deployed)
# (not enforced) test in production
# (not enforced) monitor logs
If it would go through normal deployments:
# Submit a patch to translatewiki repository
# Have the patch merged
# Log in to web2
# Run `twn-update-config`
# (not enforced) test in canary
# (not enforced) check logs
# `cd /srv/mediawiki`
# `b oregano tag`
# `b oregano deploy` (change is now deployed)
# (not enforced) monitor logs
We could a a single command to do steps 7-9 to make it a bit easier.
=== Pros and cons
| Current process | New process |
| {icon plus-circle} Simple and fast | {icon plus-circle} Will not cause downtime if checked on canary first
| {icon minus-circle} Risky, can cause downtime | {icon plus-circle} Same process, no surprises
| {icon minus-circle} Different, surprising process compared to code deployments | {icon minus-circle} Additional steps
|| {icon minus-circle} Requires learning how to use canary, and it cannot be enforced
|| {icon minus-circle} May cause "split-brain" scenario as caches are shared (already happens for code, but all such changes (database schemas, message keys) should be done to take this into account
== List of data not part of deployments that are used during PHP web requests
* /resources/caches/translatewiki.net/messagechanges.*
* /resources/caches/translatewiki.net/translate_messageindex.cdb
* /resources/caches/translatewiki.net/translate_groupcache-*
* /www/translatewiki.net/logs/
* /home/betawiki/config/groups/
* /home/betawiki/config/groups/MediaWiki/MediaWikiTopMessageGroup.php
* /home/betawiki/config/webfiles/ (via symlink from `workdir`)
== List of configuration not part of deployment that are used during PHP web requests
* /home/betawiki/config/DevelopmentSettings.php (not in production)
* /home/betawiki/config/ExtensionSettings.php
* /home/betawiki/config/FallbackSettings.php
* /home/betawiki/config/nikext.php
* /home/betawiki/config/nikext.i18n.magic.php
* /home/betawiki/config/PermissionSettings.php
* /home/betawiki/config/SpecialRally.php
* /home/betawiki/config/TranslateSettings.php
* /home/betawiki/config/TranslatewikiSettings.php
* /home/betawiki/config/groups/validation-exclusion-list.php
The scope of this task is PHP configuration files.
== Plan
[x] Empty current `workdir/config` directory (it's unused)
[x] Block `workdir/config` directory in nginx config
[x] Move all PHP configuration files under `[translatewiki-repo]/mw-config` for clarity and grouping. Keep existing files as redirects via symlinks
[x] Update all `twn-update-config` (and `twn-update-all`??) to rsync `[translatewiki-repo]/mw-config` to `workdir/config`
[x] Update references in configuration files to read from `workdir/config`
[] Remove symlinks
Other cleanups to do separately:
* Move nikext, Special:Rally and webfiles to a separate mini-extension