Page MenuHomePhabricator

Canary doesn't rollback if you don't continue
Closed, ResolvedPublic

Description

arlolra@mira:/srv/deployment/parsoid/deploy$ scap deploy
20:18:38 Started Deploy: parsoid/deploy
Entering 'src'
20:18:38 
== CANARY ==
:* wtp2001.codfw.wmnet
:* wtp2002.codfw.wmnet
:* wtp1002.eqiad.wmnet
:* wtp1001.eqiad.wmnet
parsoid/deploy: fetch stage(s): 100% (ok: 4; fail: 0; left: 0)                  
parsoid/deploy: config_deploy stage(s): 100% (ok: 4; fail: 0; left: 0)          
parsoid/deploy: promote and restart_service stage(s): 100% (ok: 4; fail: 0; left: 0)
canary deploy successful. Continue? [y]es/[n]o/[c]ontinue all groups: no
20:19:47 Finished Deploy: parsoid/deploy (duration: 01m 09s)

...

arlolra@wtp1001:~$ curl localhost:8000/_version
{"name":"parsoid","version":"0.5.1+git","sha":"63f1e1512785d31a15f97210fce14b715dfd1a95"}

Similarly to T145460, I noted that wtp2019.codfw.wmnet was down, but only after having invoked scap deploy. I assumed that saying no to continuing the deploy at the canary stage would allow me to rollback and remove it from the targets. However, the deploy just finished and the canaries were confirmed to be left on the new commit.

Event Timeline

Arlolra created this task.Oct 24 2016, 8:32 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 24 2016, 8:32 PM
thcipriani triaged this task as Low priority.Oct 24 2016, 9:01 PM
thcipriani moved this task from Needs triage to Services improvements on the Scap board.
thcipriani added a subscriber: thcipriani.

This is currently expected behavior. Continue? no is just a stop since nothing is "wrong".

We could implement a rollback option here, likely.

thcipriani renamed this task from Canary didn't rollback? to Canary doesn't rollback if you don't continue.Oct 24 2016, 9:02 PM

I see. I guess the process the current canary is catering for are automated checks that would fail and induce a rollback.

We tend deploy the canary and then manually monitor the health of the cluster before continuing. At which point, a rollback would be quite useful.

We tend deploy the canary and then manually monitor the health of the cluster before continuing. At which point, a rollback would be quite useful.

It seems like a good feature and this should be possible once we have a fix for T149012: Scap rollback fails after promote completes.