Page MenuHomePhabricator

Scap deploying arbitrary revision, incorrectly reporting correct revision
Closed, ResolvedPublic

Description

Steps to reproduce

  • I'm on beta tin, and have checked out revision rORESDEPLOY29905e509984 into /srv/deployment/ores/deploy.
  • scap deploy -l "ores1*" "test deployment to ores* (non-production)"

118:15:04 [tin] Started deploy [ores/deploy@29905e5]
218:15:04 [tin] Started deploy [ores/deploy@29905e5]: test deployment to ores* (non-production)
318:15:04 [tin]
4== CLUSTER ==
5:* ores1001.eqiad.wmnet
6:* ores1003.eqiad.wmnet
7:* ores1005.eqiad.wmnet
8:* ores1007.eqiad.wmnet
9:* ores1002.eqiad.wmnet
10:* ores1008.eqiad.wmnet
11:* ores1009.eqiad.wmnet
12:* ores1004.eqiad.wmnet
13:* ores1006.eqiad.wmnet
1418:15:05 [ores1001.eqiad.wmnet] Fetch from: http://tin.eqiad.wmnet/ores/deploy/.git
1518:15:07 [ores1001.eqiad.wmnet] Checkout rev: 4f2d1d0389449cedb26cc7f5f31ffab2098cb203
1618:15:07 [ores1001.eqiad.wmnet] Unhandled error:
17Traceback (most recent call last):
18 File "/usr/lib/python2.7/dist-packages/scap/cli.py", line 308, in run
19 exit_status = app.main(app.extra_arguments)
20 File "/usr/lib/python2.7/dist-packages/scap/deploy.py", line 139, in main
21 getattr(self, stage)()
22 File "/usr/lib/python2.7/dist-packages/scap/deploy.py", line 299, in fetch
23 git.checkout(rev_dir, self.rev)
24 File "/usr/lib/python2.7/dist-packages/scap/git.py", line 350, in checkout
25 subprocess.check_call(cmd, shell=True)
26 File "/usr/lib/python2.7/subprocess.py", line 540, in check_call
27 raise CalledProcessError(retcode, cmd)
28CalledProcessError: Command '/usr/bin/git checkout --force --quiet 4f2d1d0389449cedb26cc7f5f31ffab2098cb203' returned non-zero exit status 128
2918:15:07 [ores1001.eqiad.wmnet] deploy-local failed: <CalledProcessError> {u'cmd': u'/usr/bin/git checkout --force --quiet 4f2d1d0389449cedb26cc7f5f31ffab2098cb203', u'output': None, u'returncode': 128}
3018:15:07 [tin] [u'/usr/bin/scap', u'deploy-local', u'-v', u'--repo', u'ores/deploy', u'-g', u'cluster', u'fetch', u'--refresh-config'] on ores1001.eqiad.wmnet returned [70]: http://tin.eqiad.wmnet/ores/deploy/.git
31From http://tin.eqiad.wmnet/ores/deploy/
32 * [new branch] CELERY_4 -> origin/CELERY_4
33 * [new branch] STABLE_REVSCORING_1 -> origin/STABLE_REVSCORING_1
34 * [new branch] master -> origin/master
35 * [new tag] scap/sync/2017-11-06/0009 -> scap/sync/2017-11-06/0009
36Cloning into '/srv/deployment/ores/deploy-cache/revs/4f2d1d0389449cedb26cc7f5f31ffab2098cb203'...
37done.
38fatal: reference is not a tree: 4f2d1d0389449cedb26cc7f5f31ffab2098cb203
39
4018:15:07 [tin] 1 targets had deploy errors

It actually tries to checkout an ancient revision. In this case it fails (inexplicably), but I've also seen it succeed in deploying the wrong revision, all the while reporting the correct one in the logs.

As a workaround, I've been including "-r HEAD" in my commandline arguments.

Event Timeline

thcipriani assigned this task to mmodell.
thcipriani subscribed.

I believe @mmodell found the root cause of this issue in T181661#3834498

Please reopen if you believe that not to be the case.