Page MenuHomePhabricator

Enable scap to roll back broken changes to MediaWiki
Open, MediumPublic

Description

When scap sync-file detects a broken deployment due to a high error rate on the canary hosts, it aborts the sync, but leaves the canaries in their broken state, still serving traffic. It would be nice if it could automatically roll those changes back. According to @Reedy, this is already supported for other kinds of deployments scap supports.

Alternatively, it could automatically depool the canary hosts (compare T104352: Make scap able to depool/repool servers via the conftool API), though it’s not clear when they would be repooled.

This task is an actionable of the 20190606-wikibase incident. It might be a duplicate of an existing one – feel free to merge it in that case. Also, T224857: Enhance MediaWiki deployments for support of php7.x is likely related.

Event Timeline

ArielGlenn triaged this task as Medium priority.Jun 11 2019, 2:39 PM

I'm pretty sure this has been brought up in past tasks, and been thought about from time to time as something to add to Scap. Can't find a task for it, though. So maybe we never created one!

From memory - I recall hearing this is non-trivial to implement, because Scap-for-MediaWiki currently isn't aware of Git patches. Once the sync-file has started with canaries (which includes the deployment host itself), then the previously version of the files no longer exists anywhere on the deployment host. This means even with the knowledge of which path is passed to the sync-file command, there is no obvious way to undo it.

This would likely be implicitly solved as part of the big Pipeline, but something earlier might also work. Specifically, I recall there being plans to treat the composite /srv/mediawiki directory as a flattened Git repo on the deployment host. This would mean each deployment is locally tracked as an implicit commit crafted by Scap, and thus reversible. It also means it is no longer possible to accidentally forget to sync part of the change because Scap would know exactly what had changed and would no longer need to be told what to deploy. Is there a task about that?

If making MW deploys auto-rollback planned in the short-mid term? (ahead of MW-Pipeline)