Page MenuHomePhabricator

Scap3 broken in Beta
Closed, ResolvedPublic

Description

deploy-local is broken in Beta, Scap version 3.6.0-1~20171024023748.201:

Error: /Stage[main]/Profile::Cpjobqueue/Service::Node[cpjobqueue]/Scap::Target[cpjobqueue/deploy]/Package[cpjobqueue/deploy]/ensure: change from absent to present failed: Execution of '/usr/bin/scap deploy-local --repo cpjobqueue/deploy -D log_json:False' returned 70: 11:20:07 Fetch from: http://deployment-tin.deployment-prep.eqiad.wmflabs/cpjobqueue/deploy/.git
11:20:07 Unhandled error:
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/scap/cli.py", line 329, in run
    exit_status = app.main(app.extra_arguments)
  File "/usr/lib/python2.7/dist-packages/scap/deploy.py", line 140, in main
    getattr(self, stage)()
  File "/usr/lib/python2.7/dist-packages/scap/deploy.py", line 274, in fetch
    git.fetch(self.context.cache_dir, git_remote)
  File "/usr/lib/python2.7/dist-packages/scap/git.py", line 327, in fetch
    subprocess.check_call(cmd)
  File "/usr/lib/python2.7/subprocess.py", line 535, in check_call
    retcode = call(*popenargs, **kwargs)
  File "/usr/lib/python2.7/subprocess.py", line 522, in call
    return Popen(*popenargs, **kwargs).wait()
  File "/usr/lib/python2.7/subprocess.py", line 710, in __init__
    errread, errwrite)
  File "/usr/lib/python2.7/subprocess.py", line 1335, in _execute_child
    raise child_exception
TypeError: execv() arg 2 must contain only strings
11:20:07 deploy-local failed: <TypeError> execv() arg 2 must contain only strings

This log entry is from the Puppet run, but the same can be observed if the Scap3 run is triggered from deployment-tin. Also, trying to deploy any repository fails in the same fashion.

Event Timeline

The problem is introduced in rMSCAccea24641f77b59211ec3c8f51c94f91e344b6b9, which [sets --jobs <n>](https://phabricator.wikimedia.org/source/scap/browse/master/scap/git.py;5557a122344682338f957e3cf8924ae5cc7cd756$313) where <n> is set as an integer instead of its stringified counterpart.

mobrovac raised the priority of this task from High to Unbreak Now!.Oct 24 2017, 11:54 AM

I manually patched scap/git.py on deployment-cpjobqueue, which made the error go away. However, git's version in Beta is v2.1.4, while the introduced functionality requires v2.11, rendering Scap completely useless in Beta:

Error: /Stage[main]/Profile::Cpjobqueue/Service::Node[cpjobqueue]/Scap::Target[cpjobqueue/deploy]/Package[cpjobqueue/deploy]/ensure: change from absent to present failed: Execution of '/usr/bin/scap deploy-local --repo cpjobqueue/deploy -D log_json:False' returned 70: 11:49:47 Fetch from: http://deployment-tin.deployment-prep.eqiad.wmflabs/cpjobqueue/deploy/.git
11:49:47 Unhandled error:
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/scap/cli.py", line 329, in run
    exit_status = app.main(app.extra_arguments)
  File "/usr/lib/python2.7/dist-packages/scap/deploy.py", line 140, in main
    getattr(self, stage)()
  File "/usr/lib/python2.7/dist-packages/scap/deploy.py", line 274, in fetch
    git.fetch(self.context.cache_dir, git_remote)
  File "/usr/lib/python2.7/dist-packages/scap/git.py", line 327, in fetch
    subprocess.check_call(cmd)
  File "/usr/lib/python2.7/subprocess.py", line 540, in check_call
    raise CalledProcessError(retcode, cmd)
CalledProcessError: Command '['/usr/bin/git', 'clone', '--jobs', '1', 'http://deployment-tin.deployment-prep.eqiad.wmflabs/cpjobqueue/deploy/.git', '/srv/deployment/cpjobqueue/deploy-cache/cache']' returned non-zero exit status 129
11:49:47 deploy-local failed: <CalledProcessError> Command '['/usr/bin/git', 'clone', '--jobs', '1', 'http://deployment-tin.deployment-prep.eqiad.wmflabs/cpjobqueue/deploy/.git', '/srv/deployment/cpjobqueue/deploy-cache/cache']' returned non-zero exit status 129

Hence, setting this ticket to UBN priority.

I think we should backport git 2.11 to wikimedia-jessie? That way all hosts that have it will install the update.

we already have git 2.11 backported to wikimedia jessie, I'm not sure why it isn't in beta...

hmm, I'm not sure how to force an upgrade on all of beta... but the package is upgradeable on cpjobqueue

@mmodell I should point out that deployment-cpjobqueue was a new instance I spun today, so the expectation should have been that the package was already installed, especially if there is a new Scap3 feature that requires it.

I expected that the same git version would be installed in prod and beta, but apparently not. I'm looking into it

Also: I'm also looking at my utils.cpus_for_jobs() I introduced in rMSCAccea2. This returns a number of cores as an integer. This matched existing behavior, but other callers mostly end up casting it to a string anyway. I think returning an int and expecting callers to deal with it is probably most reasonable.

Also: we should probably pin a >= version in our debian packaging so this doesn't subtly sneak up in the future.

twentyafterfour@deployment-parsoid09:~$ apt-cache policy git
git:
  Installed: 1:2.11.0-2~bpo8+1
  Candidate: 1:2.11.0-2~bpo8+1
  Version table:
     1:2.11.0-3~bpo8+1 0
        100 http://mirrors.wikimedia.org/debian/ jessie-backports/main amd64 Packages
 *** 1:2.11.0-2~bpo8+1 0
       1001 http://apt.wikimedia.org/wikimedia/ jessie-wikimedia/experimental amd64 Packages
        100 /var/lib/dpkg/status
     1:2.1.4-2.1+deb8u5 0
        500 http://security.debian.org/ jessie/updates/main amd64 Packages
     1:2.1.4-2.1+deb8u3 0
        500 http://httpredir.debian.org/debian/ jessie/main amd64 Packages

The correct version is in apt, I can't figure out why it would install the old version on a new instance.

twentyafterfour@deployment-sca02:~$ apt-cache policy git
git:
  Installed: 1:2.1.4-2.1+deb8u5
  Candidate: 1:2.11.0-2~bpo8+1
  Version table:
     1:2.11.0-3~bpo8+1 0
        100 http://mirrors.wikimedia.org/debian/ jessie-backports/main amd64 Packages
     1:2.11.0-2~bpo8+1 0
       1001 http://apt.wikimedia.org/wikimedia/ jessie-wikimedia/experimental amd64 Packages
 *** 1:2.1.4-2.1+deb8u5 0
        500 http://security.debian.org/ jessie/updates/main amd64 Packages
        100 /var/lib/dpkg/status
     1:2.1.4-2.1+deb8u3 0
        500 http://http.debian.net/debian/ jessie/main amd64 Packages

Still need to sort out the git packaging issue as we're moving from a soft to hard requirement on git 2.11. But the above commit will at least fix the str/int issue.

so it seems that most beta hosts already had git 2.11, however, a few (e.g. deployment-sca02) did not. @madhuvishy ran apt-get install git for us with cumin, so that should take care of all the straggler nodes. What I don't understand is why a newly built instance had the old version. I guess we don't run apt-get upgrade as part of the image build process? Maybe we should start doing that.

I can confirm git is up to date now, but it seems that @demon's commit missed [one more invocation of cpus_for_jobs](https://phabricator.wikimedia.org/source/scap/browse/master/scap/git.py;35f0a00ef6ab9b6079511c49e9345d40f654d221$372) so deployments are still failing.

Pushed a fix, it should build and be live before long.

mobrovac assigned this task to demon.

All good now - Scap deployments are back. Resolving.