Page MenuHomePhabricator

Integrate mwdebug staging as part of `scap sync-dir`
Closed, ResolvedPublic

Description

I'm leading with a specific proposal. However, it is motivated by a specific problem which if solved another way this task should be updated to reflect.

Problem statement

When staging a change on mwdebug servers, we use scap pull, which syncs the entire deployment root.

When deploying a change to production, we generally use scap sync-dir path/to/somedir where "somedir" is the highest directory that contains all changes in the last commit.

Due to these being significantly different there is a big potential for human error. Either due to having forgotten about other changes applied to the deployment server or (more common) due to the patch in question changing multiple directories where some are assumed to be "insignificant" and thus not synced to prod.

There is also the common issue of a commit affecting two files in wmf-config where they have to be synced in a very specific order or Bad Things (TM) happen. While this is discouraged in SWAT (T187761) it does still happen occasionally in SWAT, and also outside SWAT. In part because splitting the patch and merging them one by one and waiting for CI in-between for the full cycle takes a lot of time. It's hard to fight the will of just doing it and then testing them separately, or "being sure" that it will happen correctly.

Proposal

This was discussed at the EngProd offsite earlier this month. I mentioned that a scap pull-dir command might make sense. But @thcipriani thought it might make sense to go a step further and embrace it completely in the process. This further reduces the room for human error. And would save more time by not requiring as much context switching for the deployer. And it also makes it harder to forget to stage.

We would, as part of scap sync-dir path/somedir by default first stage on a given mwdebug server. Then provide an interactive prompt where we wait for the user to signal that they have done manual verification (or asked and heard back from the SWAT-submitter doing so), and then automatically proceed to the rest of the deployment process (canaries, logstash checker, rest of prod, etc.)

Event Timeline

scap backport has been implements some of the suggestions in this ticket.

dancy claimed this task.

This is scap sync-world --pause-after-testserver-sync