Page MenuHomePhabricator

scap deploy --init on deployment server fails on first puppet run
Open, Needs TriagePublic

Description

From integration/docroot scap configuration for T256005:

1root@deploy1001:/srv/deployment$ run-puppet-agent
2Info: Using configured environment 'production'
3Info: Retrieving pluginfacts
4Info: Retrieving plugin
5Info: Retrieving locales
6Info: Loading facts
7Info: Caching catalog for deploy1001.eqiad.wmnet
8Info: Applying configuration version '(f13690ff17) Antoine Musso - Fix scap config for integration/docroot'
9Error: Execution of '/usr/bin/scap deploy --init' returned 70: 14:10:06 Started setup [integration/docroot@708d3eb]
1014:10:06 Deploying Rev: HEAD = 708d3eba6bf056e8bfb9ff516f8ee93108880cab
1114:10:06 Finished setup [integration/docroot@708d3eb] (duration: 00m 00s)
1214:10:06 Unhandled error:
13Traceback (most recent call last):
14 File "/usr/lib/python2.7/dist-packages/scap/cli.py", line 341, in run
15 exit_status = app.main(app.extra_arguments)
16 File "/usr/lib/python2.7/dist-packages/scap/deploy.py", line 721, in main
17 git.tag_repo(self.deploy_info, location=self.context.root)
18 File "/usr/lib/python2.7/dist-packages/scap/git.py", line 486, in tag_repo
19 subprocess.check_call(cmd, shell=True)
20 File "/usr/lib/python2.7/subprocess.py", line 186, in check_call
21 raise CalledProcessError(retcode, cmd)
22CalledProcessError: Command '
23 /usr/bin/git tag -fa \
24 -m 'user trebuchet' \
25 -m 'timestamp 2020-07-07T14:10:06.379026' -- \
26 scap/sync/2020-07-07/0001 708d3eba6bf056e8bfb9ff516f8ee93108880cab
27 ' returned non-zero exit status 128
2814:10:06 deploy failed: <CalledProcessError> Command '
29 /usr/bin/git tag -fa \
30 -m 'user trebuchet' \
31 -m 'timestamp 2020-07-07T14:10:06.379026' -- \
32 scap/sync/2020-07-07/0001 708d3eba6bf056e8bfb9ff516f8ee93108880cab
33 ' returned non-zero exit status 128
34
35Error: /Stage[main]/Profile::Mediawiki::Deployment::Server/Scap::Source[integration/docroot]/Scap_source[integration/docroot]/ensure: change from 'absent' to 'present' failed: Execution of '/usr/bin/scap deploy --init' returned 70: 14:10:06 Started setup [integration/docroot@708d3eb]
3614:10:06 Deploying Rev: HEAD = 708d3eba6bf056e8bfb9ff516f8ee93108880cab
3714:10:06 Finished setup [integration/docroot@708d3eb] (duration: 00m 00s)
3814:10:06 Unhandled error:
39Traceback (most recent call last):
40 File "/usr/lib/python2.7/dist-packages/scap/cli.py", line 341, in run
41 exit_status = app.main(app.extra_arguments)
42 File "/usr/lib/python2.7/dist-packages/scap/deploy.py", line 721, in main
43 git.tag_repo(self.deploy_info, location=self.context.root)
44 File "/usr/lib/python2.7/dist-packages/scap/git.py", line 486, in tag_repo
45 subprocess.check_call(cmd, shell=True)
46 File "/usr/lib/python2.7/subprocess.py", line 186, in check_call
47 raise CalledProcessError(retcode, cmd)
48CalledProcessError: Command '
49 /usr/bin/git tag -fa \
50 -m 'user trebuchet' \
51 -m 'timestamp 2020-07-07T14:10:06.379026' -- \
52 scap/sync/2020-07-07/0001 708d3eba6bf056e8bfb9ff516f8ee93108880cab
53 ' returned non-zero exit status 128
5414:10:06 deploy failed: <CalledProcessError> Command '
55 /usr/bin/git tag -fa \
56 -m 'user trebuchet' \
57 -m 'timestamp 2020-07-07T14:10:06.379026' -- \
58 scap/sync/2020-07-07/0001 708d3eba6bf056e8bfb9ff516f8ee93108880cab
59 ' returned non-zero exit status 128
60
61Notice: Applied catalog in 39.96 seconds

Related Objects

Event Timeline

One of the issue is that we only know that git exited with return code 128 and loose the useful stderr output. It seems to be swallowed by Scap.

Related to this, although a slightly different issue, scap deploy --init additionally failed on the non-primary deployment server because scap is disabled there. I think scap syncronization (all methods) should be disabled because of the lock, but probably --init should be allowed and not respect the lock to prevent a dependency loop?

Same issue here when trying to setup deploy1002 the successor of deploy1001: T265963#6660917

And same suggestion how to fix that Jaime made above. scap sync should NOT be allowed but scap deploy --init should be allowed. Then we wouldn't have puppet errors and could make sure everything is ok BEFORE having to switch the deployment server and making it the new active one.

One of the issue is that we only know that git exited with return code 128 and loose the useful stderr output. It seems to be swallowed by Scap.

That happens here, apparently:
rMSCA /scap/cli.py:247

there is a "scap-sync-master" command which can be run manually on new deployment servers, it takes care of /srv/mediawiki-staging and /srv/patches.

Additionally I added puppet code so that /srv/patches is handled like /srv/deployment with automatic rsync.

Finally you can run "scap pull" on a new host twice to fill up /srv/mediawiki and it should be fine (after maybe an error on first run).