Page MenuHomePhabricator

Scap3 fails to restart the service on deploy
Closed, ResolvedPublic

Description

When trying to deploy Mathoid in Beta, the deployment failed while restarting the service:

-- Opening log file: '/srv/deployment/mathoid/deploy/scap/log/scap-sync-2016-10-10-0001-2-g52f345b.log'
09:42:21 [deployment-tin] Started Deploy: mathoid/deploy
09:42:21 [deployment-tin] 
== DEFAULT ==
:* deployment-mathoid.deployment-prep.eqiad.wmflabs
09:42:22 [deployment-mathoid.deployment-prep.eqiad.wmflabs] Fetch from: http://deployment-tin.deployment-prep.eqiad.wmflabs/mathoid/deploy/.git
09:42:24 [deployment-mathoid.deployment-prep.eqiad.wmflabs] Checkout rev: 52f345b6a0975a7ca2de723dba6067a797859917
09:42:25 [deployment-mathoid.deployment-prep.eqiad.wmflabs] Update submodules
09:42:25 [deployment-mathoid.deployment-prep.eqiad.wmflabs] Updating .gitmodule: /srv/deployment/mathoid/deploy-cache/revs/52f345b6a0975a7ca2de723dba6067a797859917
09:42:37 [deployment-mathoid.deployment-prep.eqiad.wmflabs] Rendering config_file: /srv/deployment/mathoid/deploy-cache/revs/52f345b6a0975a7ca2de723dba6067a797859917/.git/config-files/etc/mathoid/config.yaml
09:42:38 [deployment-mathoid.deployment-prep.eqiad.wmflabs] Linking config files at: /srv/deployment/mathoid/deploy-cache/revs/52f345b6a0975a7ca2de723dba6067a797859917/.git/config-files
09:42:38 [deployment-mathoid.deployment-prep.eqiad.wmflabs] Restarting service 'mathoid'
09:42:38 [deployment-mathoid.deployment-prep.eqiad.wmflabs] Unhandled error:
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/scap/cli.py", line 257, in run
    exit_status = app.main(app.extra_arguments)
  File "/usr/lib/python2.7/dist-packages/scap/deploy.py", line 105, in main
    getattr(self, stage)()
  File "/usr/lib/python2.7/dist-packages/scap/deploy.py", line 294, in restart_service
    tasks.restart_service(service)
  File "/usr/lib/python2.7/dist-packages/scap/utils.py", line 376, in context_wrapper
    return func(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/scap/tasks.py", line 740, in restart_service
    subprocess.check_call('sudo /usr/sbin/service {} restart'.format(service))
  File "/usr/lib/python2.7/subprocess.py", line 535, in check_call
    retcode = call(*popenargs, **kwargs)
  File "/usr/lib/python2.7/subprocess.py", line 522, in call
    return Popen(*popenargs, **kwargs).wait()
  File "/usr/lib/python2.7/subprocess.py", line 710, in __init__
    errread, errwrite)
  File "/usr/lib/python2.7/subprocess.py", line 1335, in _execute_child
    raise child_exception
OSError: [Errno 2] No such file or directory
09:42:38 [deployment-mathoid.deployment-prep.eqiad.wmflabs] deploy-local failed: <OSError> {u'child_traceback': u'Traceback (most recent call last):\n  File "/usr/lib/python2.7/subprocess.py", line 1290, in _execute_child\n    os.execvp(executable, args)\n  File "/usr/lib/python2.7/os.py", line 346, in execvp\n    _execvpe(file, args)\n  File "/usr/lib/python2.7/os.py", line 370, in _execvpe\n    func(file, *argrest)\nOSError: [Errno 2] No such file or directory\n'}
09:42:38 [deployment-tin] [u'/usr/bin/scap', u'deploy-local', u'-v', u'--repo', u'mathoid/deploy', u'--force', u'-g', u'default', u'promote', u'--refresh-config'] on deployment-mathoid.deployment-prep.eqiad.wmflabs returned [70]: 
09:42:38 [deployment-tin] 1 targets had deploy errors
09:42:47 [deployment-tin] Finished Deploy: mathoid/deploy (duration: 00m 26s)

However, logging onto the node in question and restarting the service manually went just fine.

Scap version on deployment-tin: 3.3.0-1+0~20161012140137.153~1.gbp057b2e

Revisions and Commits

rMSCA Scap
Restricted Differential Revision

Event Timeline

Note that the deployment in production (where the version is 3.3.0-1) was successful, so this seems to be a regression.

Sorry, wrong report. It turns out scap versions on deployment-tin and the target mismatched due to Puppet not being able to run there (because of unrelated issues). Puppet's back on and all is good.

Actually, the problem is back. It is manifesting itself on at least deployment-sca0x and deployment-changeprop, but with a slightly modified stack trace:

Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/scap/cli.py", line 261, in run
    exit_status = app.main(app.extra_arguments)
  File "/usr/lib/python2.7/dist-packages/scap/deploy.py", line 105, in main
    getattr(self, stage)()
  File "/usr/lib/python2.7/dist-packages/scap/deploy.py", line 294, in restart_service
    tasks.restart_service(service)
  File "/usr/lib/python2.7/dist-packages/scap/utils.py", line 376, in context_wrapper
    return func(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/scap/tasks.py", line 740, in restart_service
    subprocess.check_call('sudo /usr/sbin/service {} restart'.format(service))
  File "/usr/lib/python2.7/subprocess.py", line 535, in check_call
    retcode = call(*popenargs, **kwargs)
  File "/usr/lib/python2.7/subprocess.py", line 522, in call
    return Popen(*popenargs, **kwargs).wait()
  File "/usr/lib/python2.7/subprocess.py", line 710, in __init__
    errread, errwrite)
  File "/usr/lib/python2.7/subprocess.py", line 1335, in _execute_child
    raise child_exception
OSError: [Errno 2] No such file or directory

Scap version on all of the nodes (including deployment-tin) is 3.3.0-1+0~20161019202100.163~1.gbp756af0

I think this regression must have something to do {D407} that we landed during the offsite.

I need to futz with this some more to figure out exactly why this is happening. Adding @dduvall and @demon since we did our code review #together

random note: I realized that #together in phab (not in a code block) adds the Release Engineering Team project. Oh, internal memes :)

thcipriani added a revision: Restricted Differential Revision.Oct 22 2016, 6:14 PM