Page MenuHomePhabricator

Scap3 broken in Beta
Closed, ResolvedPublic

Description

deploy-local is broken in Beta, Scap version 3.6.0-1~20171024023748.201:

Error: /Stage[main]/Profile::Cpjobqueue/Service::Node[cpjobqueue]/Scap::Target[cpjobqueue/deploy]/Package[cpjobqueue/deploy]/ensure: change from absent to present failed: Execution of '/usr/bin/scap deploy-local --repo cpjobqueue/deploy -D log_json:False' returned 70: 11:20:07 Fetch from: http://deployment-tin.deployment-prep.eqiad.wmflabs/cpjobqueue/deploy/.git
11:20:07 Unhandled error:
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/scap/cli.py", line 329, in run
    exit_status = app.main(app.extra_arguments)
  File "/usr/lib/python2.7/dist-packages/scap/deploy.py", line 140, in main
    getattr(self, stage)()
  File "/usr/lib/python2.7/dist-packages/scap/deploy.py", line 274, in fetch
    git.fetch(self.context.cache_dir, git_remote)
  File "/usr/lib/python2.7/dist-packages/scap/git.py", line 327, in fetch
    subprocess.check_call(cmd)
  File "/usr/lib/python2.7/subprocess.py", line 535, in check_call
    retcode = call(*popenargs, **kwargs)
  File "/usr/lib/python2.7/subprocess.py", line 522, in call
    return Popen(*popenargs, **kwargs).wait()
  File "/usr/lib/python2.7/subprocess.py", line 710, in __init__
    errread, errwrite)
  File "/usr/lib/python2.7/subprocess.py", line 1335, in _execute_child
    raise child_exception
TypeError: execv() arg 2 must contain only strings
11:20:07 deploy-local failed: <TypeError> execv() arg 2 must contain only strings

This log entry is from the Puppet run, but the same can be observed if the Scap3 run is triggered from deployment-tin. Also, trying to deploy any repository fails in the same fashion.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 24 2017, 11:31 AM
mobrovac edited subscribers, added: mmodell; removed: Aklapper.Oct 24 2017, 11:47 AM

The problem is introduced in rMSCAccea24641f77b59211ec3c8f51c94f91e344b6b9, which sets --jobs <n> where <n> is set as an integer instead of its stringified counterpart.

mobrovac raised the priority of this task from High to Unbreak Now!.Oct 24 2017, 11:54 AM

I manually patched scap/git.py on deployment-cpjobqueue, which made the error go away. However, git's version in Beta is v2.1.4, while the introduced functionality requires v2.11, rendering Scap completely useless in Beta:

Error: /Stage[main]/Profile::Cpjobqueue/Service::Node[cpjobqueue]/Scap::Target[cpjobqueue/deploy]/Package[cpjobqueue/deploy]/ensure: change from absent to present failed: Execution of '/usr/bin/scap deploy-local --repo cpjobqueue/deploy -D log_json:False' returned 70: 11:49:47 Fetch from: http://deployment-tin.deployment-prep.eqiad.wmflabs/cpjobqueue/deploy/.git
11:49:47 Unhandled error:
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/scap/cli.py", line 329, in run
    exit_status = app.main(app.extra_arguments)
  File "/usr/lib/python2.7/dist-packages/scap/deploy.py", line 140, in main
    getattr(self, stage)()
  File "/usr/lib/python2.7/dist-packages/scap/deploy.py", line 274, in fetch
    git.fetch(self.context.cache_dir, git_remote)
  File "/usr/lib/python2.7/dist-packages/scap/git.py", line 327, in fetch
    subprocess.check_call(cmd)
  File "/usr/lib/python2.7/subprocess.py", line 540, in check_call
    raise CalledProcessError(retcode, cmd)
CalledProcessError: Command '['/usr/bin/git', 'clone', '--jobs', '1', 'http://deployment-tin.deployment-prep.eqiad.wmflabs/cpjobqueue/deploy/.git', '/srv/deployment/cpjobqueue/deploy-cache/cache']' returned non-zero exit status 129
11:49:47 deploy-local failed: <CalledProcessError> Command '['/usr/bin/git', 'clone', '--jobs', '1', 'http://deployment-tin.deployment-prep.eqiad.wmflabs/cpjobqueue/deploy/.git', '/srv/deployment/cpjobqueue/deploy-cache/cache']' returned non-zero exit status 129

Hence, setting this ticket to UBN priority.

Restricted Application added subscribers: Liuxinyu970226, Jay8g, TerraCodes. · View Herald TranscriptOct 24 2017, 11:54 AM

I think we should backport git 2.11 to wikimedia-jessie? That way all hosts that have it will install the update.

we already have git 2.11 backported to wikimedia jessie, I'm not sure why it isn't in beta...

mmodell added a comment.EditedOct 24 2017, 4:31 PM

hmm, I'm not sure how to force an upgrade on all of beta... but the package is upgradeable on cpjobqueue

@mmodell I should point out that deployment-cpjobqueue was a new instance I spun today, so the expectation should have been that the package was already installed, especially if there is a new Scap3 feature that requires it.

I expected that the same git version would be installed in prod and beta, but apparently not. I'm looking into it

demon added a subscriber: demon.Oct 24 2017, 4:52 PM

Also: I'm also looking at my utils.cpus_for_jobs() I introduced in rMSCAccea2. This returns a number of cores as an integer. This matched existing behavior, but other callers mostly end up casting it to a string anyway. I think returning an int and expecting callers to deal with it is probably most reasonable.

Also: we should probably pin a >= version in our debian packaging so this doesn't subtly sneak up in the future.

twentyafterfour@deployment-parsoid09:~$ apt-cache policy git
git:
  Installed: 1:2.11.0-2~bpo8+1
  Candidate: 1:2.11.0-2~bpo8+1
  Version table:
     1:2.11.0-3~bpo8+1 0
        100 http://mirrors.wikimedia.org/debian/ jessie-backports/main amd64 Packages
 *** 1:2.11.0-2~bpo8+1 0
       1001 http://apt.wikimedia.org/wikimedia/ jessie-wikimedia/experimental amd64 Packages
        100 /var/lib/dpkg/status
     1:2.1.4-2.1+deb8u5 0
        500 http://security.debian.org/ jessie/updates/main amd64 Packages
     1:2.1.4-2.1+deb8u3 0
        500 http://httpredir.debian.org/debian/ jessie/main amd64 Packages

The correct version is in apt, I can't figure out why it would install the old version on a new instance.

twentyafterfour@deployment-sca02:~$ apt-cache policy git
git:
  Installed: 1:2.1.4-2.1+deb8u5
  Candidate: 1:2.11.0-2~bpo8+1
  Version table:
     1:2.11.0-3~bpo8+1 0
        100 http://mirrors.wikimedia.org/debian/ jessie-backports/main amd64 Packages
     1:2.11.0-2~bpo8+1 0
       1001 http://apt.wikimedia.org/wikimedia/ jessie-wikimedia/experimental amd64 Packages
 *** 1:2.1.4-2.1+deb8u5 0
        500 http://security.debian.org/ jessie/updates/main amd64 Packages
        100 /var/lib/dpkg/status
     1:2.1.4-2.1+deb8u3 0
        500 http://http.debian.net/debian/ jessie/main amd64 Packages
demon added a comment.Oct 24 2017, 5:21 PM

Still need to sort out the git packaging issue as we're moving from a soft to hard requirement on git 2.11. But the above commit will at least fix the str/int issue.

so it seems that most beta hosts already had git 2.11, however, a few (e.g. deployment-sca02) did not. @madhuvishy ran apt-get install git for us with cumin, so that should take care of all the straggler nodes. What I don't understand is why a newly built instance had the old version. I guess we don't run apt-get upgrade as part of the image build process? Maybe we should start doing that.

awight added a subscriber: awight.Oct 24 2017, 8:17 PM

I can confirm git is up to date now, but it seems that @demon's commit missed one more invocation of cpus_for_jobs so deployments are still failing.

demon added a comment.Oct 24 2017, 9:34 PM

Pushed a fix, it should build and be live before long.

mobrovac closed this task as Resolved.Oct 25 2017, 10:09 AM
mobrovac assigned this task to demon.

All good now - Scap deployments are back. Resolving.