Page MenuHomePhabricator

Require full deployments for configuration changes
Closed, ResolvedPublic8 Estimated Story Points

Description

Motivation

Currently configuration updates are applied immediately. This makes it impossible to do any sanity checks before deploying, e.g. using the canary feature we have for code deployments. Canary has worked perfectly to avoid downtime due to bad code deployments, but at 2020-07-29 we had 6 minute downtime due to mistake in a configuration update.

Proposal

Use normal deployment process also for configuration updates.

Process comparison

Currently, to do a configuration update:

  1. Submit a patch to translatewiki repository
  2. Have the patch merged
  3. Log in to web2
  4. Run twn-update-config (change is now deployed)
  5. (not enforced) test in production
  6. (not enforced) monitor logs

If it would go through normal deployments:

  1. Submit a patch to translatewiki repository
  2. Have the patch merged
  3. Log in to web2
  4. Run twn-update-config
  5. (not enforced) test in canary
  6. (not enforced) check logs
  7. cd /srv/mediawiki
  8. b oregano tag
  9. b oregano deploy (change is now deployed)
  10. (not enforced) monitor logs

We could a a single command to do steps 7-9 to make it a bit easier.

Pros and cons

Current processNew process
Simple and fast Will not cause downtime if checked on canary first
Risky, can cause downtime Same process, no surprises
Different, surprising process compared to code deployments Additional steps
Requires learning how to use canary, and it cannot be enforced
May cause "split-brain" scenario as caches are shared (already happens for code, but all such changes (database schemas, message keys) should be done to take this into account

List of data not part of deployments that are used during PHP web requests

  • /resources/caches/translatewiki.net/messagechanges.*
  • /resources/caches/translatewiki.net/translate_messageindex.cdb
  • /resources/caches/translatewiki.net/translate_groupcache-*
  • /www/translatewiki.net/logs/
  • /home/betawiki/config/groups/
  • /home/betawiki/config/groups/MediaWiki/MediaWikiTopMessageGroup.php
  • /home/betawiki/config/webfiles/ (via symlink from workdir)

List of configuration not part of deployment that are used during PHP web requests

  • /home/betawiki/config/DevelopmentSettings.php (not in production)
  • /home/betawiki/config/ExtensionSettings.php
  • /home/betawiki/config/FallbackSettings.php
  • /home/betawiki/config/nikext.php
  • /home/betawiki/config/nikext.i18n.magic.php
  • /home/betawiki/config/PermissionSettings.php
  • /home/betawiki/config/SpecialRally.php
  • /home/betawiki/config/TranslateSettings.php
  • /home/betawiki/config/TranslatewikiSettings.php
  • /home/betawiki/config/groups/validation-exclusion-list.php

The scope of this task is PHP configuration files.

Plan

  • Empty current workdir/config directory (it's unused)
  • Block workdir/config directory in nginx config
  • Move all PHP configuration files under [translatewiki-repo]/mw-config for clarity and grouping. Keep existing files as redirects via symlinks
  • Update all twn-update-config (and twn-update-all??) to rsync [translatewiki-repo]/mw-config to workdir/config
  • Update references in configuration files to read from workdir/config
  • Remove symlinks

Other cleanups to do separately:

  • Move nikext, Special:Rally and webfiles to a separate mini-extension

Event Timeline

Adding some notes from the retro meeting that I had with Niklas regarding deployments:

  • We should look at updating configuration via the usual deployment process and test it via canary before we release.
  • Update twn-update-config to validate the group configuration schema, since in this case we will have the Translate extension available.

CC: @Raymond

Nikerabbit set the point value for this task to 8.Mar 1 2022, 8:16 AM
Nikerabbit renamed this task from Evaluate requiring full deployments for configuration changes to Require full deployments for configuration changes.Mar 3 2022, 9:10 AM

That plan looks good. Thinking out loud, we currently maintain only 4 tags. Since configurations will also now also require deployment, should we keep more tags?

I don't remember ever reverting back to more than one tag.

Change 775811 had a related patch set uploaded (by Nikerabbit; author: Nikerabbit):

[translatewiki@master] Add docroot/config nginx deny list

https://gerrit.wikimedia.org/r/775811

Change 775812 had a related patch set uploaded (by Nikerabbit; author: Nikerabbit):

[translatewiki@master] Move wiki PHP config files to mw-config and update twn-update-config

https://gerrit.wikimedia.org/r/775812

Change 775813 had a related patch set uploaded (by Nikerabbit; author: Nikerabbit):

[translatewiki@master] Remove symlinks for config files

https://gerrit.wikimedia.org/r/775813

Change 775811 merged by jenkins-bot:

[translatewiki@master] Add docroot/config nginx deny list

https://gerrit.wikimedia.org/r/775811

Change 775812 merged by jenkins-bot:

[translatewiki@master] Move wiki PHP config files to mw-config and update twn-update-config

https://gerrit.wikimedia.org/r/775812

Change 791002 had a related patch set uploaded (by Nikerabbit; author: Nikerabbit):

[translatewiki@master] Fix $GROUPS in TranslateSettings.php

https://gerrit.wikimedia.org/r/791002

Change 791002 merged by jenkins-bot:

[translatewiki@master] Fix $GROUPS in TranslateSettings.php

https://gerrit.wikimedia.org/r/791002

Change 775813 merged by jenkins-bot:

[translatewiki@master] Remove symlinks for config files

https://gerrit.wikimedia.org/r/775813

Nikerabbit removed a project: Patch-For-Review.
Nikerabbit updated the task description. (Show Details)
Nikerabbit changed Final Story Points from 4 to 8.