Page MenuHomePhabricator

ORES deployment finish "successfully" even when uwsgi and celery fail to successfully start up
Closed, ResolvedPublic

Description

$ scap deploy -v T170485
16:21:09 Started deploy [ores/deploy@1d35aa5]
16:21:09 Deploying Rev: 1d35aa5b853f304bb11dd46bc79dfc3660f68ce8
16:21:09 Update DEPLOY_HEAD
16:21:09 Creating /srv/deployment/ores/deploy/.git/DEPLOY_HEAD
Deleted tag 'scap/sync/2017-02-04/0003' (was 7c228c6)
16:21:09 Update server info
Entering 'submodules/draftquality'
Entering 'submodules/editquality'
Entering 'submodules/ores'
Entering 'submodules/wheels'
Entering 'submodules/wikiclass'
16:21:09 Started deploy [ores/deploy@1d35aa5]: T170485
16:21:09 
== WORKER ==
:* deployment-sca03.deployment-prep.eqiad.wmflabs
16:21:09 Running remote deploy cmd ['/usr/bin/scap', 'deploy-local', '-v', '--repo', 'ores/deploy', '-g', 'worker', 'fetch', '--refresh-config']
ores/deploy: fetch stage(s): 100% (ok: 1; fail: 0; left: 0)                     
16:21:12 Running remote deploy cmd ['/usr/bin/scap', 'deploy-local', '-v', '--repo', 'ores/deploy', '-g', 'worker', 'config_deploy', '--refresh-config']
ores/deploy: config_deploy stage(s): 100% (ok: 1; fail: 0; left: 0)             
16:21:13 Running remote deploy cmd ['/usr/bin/scap', 'deploy-local', '-v', '--repo', 'ores/deploy', '-g', 'worker', 'promote', '--refresh-config']
ores/deploy: promote and restart_service stage(s): 100% (ok: 1; fail: 0; left: 0)
16:21:15 
== WORKER ==
:* deployment-sca03.deployment-prep.eqiad.wmflabs
16:21:15 Running remote deploy cmd ['/usr/bin/scap', 'deploy-local', '-v', '--repo', 'ores/deploy', '-g', 'worker', 'finalize', '--refresh-config']
ores/deploy: finalize stage(s): 100% (ok: 1; fail: 0; left: 0)                  
16:21:16 Finished deploy [ores/deploy@1d35aa5]: T170485 (duration: 00m 07s)
16:21:16 Finished deploy [ores/deploy@1d35aa5] (duration: 00m 07s)

But when I logged into deployment-sca03 uwsgi and celery had both failed to restart at all.

Event Timeline

Halfak triaged this task as High priority.Jul 20 2017, 2:50 PM
Halfak moved this task from Unsorted to Maintenance/cleanup on the Machine-Learning-Team board.

Change 474690 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[mediawiki/services/ores/deploy@master] Add check for celery service in scap

https://gerrit.wikimedia.org/r/474690

Change 474690 merged by Ladsgroup:
[mediawiki/services/ores/deploy@master] Add check for celery service in scap

https://gerrit.wikimedia.org/r/474690

Mentioned in SAL (#wikimedia-operations) [2018-11-19T15:45:33Z] <ladsgroup@deploy1001> Finished deploy [ores/deploy@e957b24]: T209587 T170950 (duration: 17m 09s)