Page MenuHomePhabricator

RESTBase deployment fails with scap internal error
Closed, ResolvedPublic

Description

trying to deploy RESTBase with scap gives an error

15:22:44 ['/usr/bin/scap', 'deploy-local', '-v', '--repo', 'restbase/deploy', '-g', 'canary', 'config_deploy', '--refresh-config'] (ran as deploy-service@restbase2010.codfw.wmnet) returned [70]: Source basepath: /srv/deployment/restbase/deploy-cache/revs/664a2f8c762ebed9604fdfb04243fe4763da3c60/.git/config-files
Unhandled error:
deploy-local failed: <AttributeError> {}

Event Timeline

More info on this

15:22:44 [restbase2010.codfw.wmnet] Unhandled error:
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/scap/cli.py", line 352, in run
    exit_status = app.main(app.extra_arguments)
  File "/usr/lib/python3/dist-packages/scap/deploy.py", line 157, in main
    getattr(self, stage)()
  File "/usr/lib/python3/dist-packages/scap/deploy.py", line 198, in config_deploy
    overrides=overrides,
  File "/usr/lib/python3/dist-packages/scap/template.py", line 87, in __init__
    env_args = self._make_env_args(loader, erb_syntax, output_format)
  File "/usr/lib/python3/dist-packages/scap/template.py", line 95, in _make_env_args
    loader = {n: f.decode("utf-8") for n, f in loader.items()}
  File "/usr/lib/python3/dist-packages/scap/template.py", line 95, in <dictcomp>
    loader = {n: f.decode("utf-8") for n, f in loader.items()}
AttributeError: 'str' object has no attribute 'decode'

Which looks like T291990: Scap error when deploying kartotherian

[thcipriani@deploy1002 ~]$ SSH_AUTH_SOCK=/run/keyholder/proxy.sock ssh -l deploy-service -oIdentitiesOnly=yes -oIdentityFile=/etc/keyholder.d/deploy_service.pub restbase2010.codfw.wmnet scap version
4.0.0-1

^ that should be 4.0.2-1 (per T291095: Deploy Scap version 4.0.2).

Pinging serviceops for help: could a serviceopsen ensure that scap is at version 4.0.2 everywhere?

Debmonitor reveals restbase2010 (and a bunch of other servers) are still using scap 4.0.0-1, not 4.0.2-1 with the fixes. I don't have an explanation for the one host running scap 3.17.1 visible on debmonitor, but 4.0.0-1 has a simple explanation: stretch-wikimedia apt repository only has that version, and not the upgraded one.

Pinging serviceops for help: could a serviceopsen ensure that scap is at version 4.0.2 everywhere?

See the discussion that starts with T294148#7460181 between me and @dancy about the latest scap not being installable on stretch hosts. Can we do a new scap release or should I just cherry-pick 6ba983cb66b16d8095464f58821cd03fc222cd70 to the currently deployed version?

I don't have an explanation for the one host running scap 3.17.1 visible on debmonitor,

mw2280 has been dead for a while (T290708: decom mw2280 (was: mw2280 unresponsive to powercycle and hardreset)), whenever it comes back online it'll get upgraded.

Mentioned in SAL (#wikimedia-operations) [2021-11-03T22:47:04Z] <legoktm> upgraded scap on A:restbase (T294936)

@Pchelolo scap is now upgraded on all the restbase hosts.

@Legoktm just tried to deploy again, same result.

Actually, different result, now it's UndefinedError instead of AttributeError..

Feel free to try yourself by running scap deploy from /srv/deployment/restbase/deploy. If it succeeds for you it won't deploy anything risky.

1legoktm@deploy1002:/srv/deployment/restbase/deploy$ scap deploy
223:01:16 Started deploy [restbase/deploy@664a2f8]
323:01:16 Deploying Rev: HEAD = 664a2f8c762ebed9604fdfb04243fe4763da3c60
423:01:16 Started deploy [restbase/deploy@664a2f8]: (no justification provided)
523:01:16
6== CANARY1 ==
7:* restbase1016.eqiad.wmnet
8restbase/deploy: fetch stage(s): 100% (ok: 1; fail: 0; left: 0)
923:01:19 ['/usr/bin/scap', 'deploy-local', '-v', '--repo', 'restbase/deploy', '-g', 'canary', 'config_deploy', '--refresh-config'] (ran as deploy-service@restbase1016.eqiad.wmnet) returned [70]: Source basepath: /srv/deployment/restbase/deploy-cache/revs/664a2f8c762ebed9604fdfb04243fe4763da3c60/.git/config-files
10Rendering config_file: /srv/deployment/restbase/deploy-cache/revs/664a2f8c762ebed9604fdfb04243fe4763da3c60/.git/config-files/etc/restbase/config.yaml
11Unhandled error:
12deploy-local failed: <UndefinedError> {}
13
14restbase/deploy: config_deploy stage(s): 100% (ok: 0; fail: 1; left: 0)
1523:01:19 1 targets had deploy errors
1623:01:19 1 targets failed
1723:01:19 1 of 2 canary1 targets failed, exceeding limit
18Rollback all deployed groups? [Y/n]:

I think the UndefinedError is a jinja2 template error.

Indeed:

1legoktm@deploy1002:/srv/deployment/restbase/deploy$ scap deploy-log
2-- Opening log file: '/srv/deployment/restbase/deploy/scap/log/scap-sync-2021-11-03-0003.log'
323:01:16 [deploy1002] Started deploy [restbase/deploy@664a2f8]
423:01:16 [deploy1002] Deploying Rev: HEAD = 664a2f8c762ebed9604fdfb04243fe4763da3c60
523:01:16 [deploy1002] Started deploy [restbase/deploy@664a2f8]: (no justification provided)
623:01:16 [deploy1002]
7== CANARY1 ==
8:* restbase1016.eqiad.wmnet
923:01:17 [restbase1016.eqiad.wmnet] Fetch from: http://deploy1002.eqiad.wmnet/restbase/deploy/.git
1023:01:18 [restbase1016.eqiad.wmnet] Update submodules
1123:01:18 [restbase1016.eqiad.wmnet] Updating .gitmodule: /srv/deployment/restbase/deploy-cache/cache
1223:01:18 [restbase1016.eqiad.wmnet] Revision directory already exists (use --force to override)
1323:01:19 [restbase1016.eqiad.wmnet] Rendering config_file: /srv/deployment/restbase/deploy-cache/revs/664a2f8c762ebed9604fdfb04243fe4763da3c60/.git/config-files/etc/restbase/config.yaml
1423:01:19 [restbase1016.eqiad.wmnet] Unhandled error:
15Traceback (most recent call last):
16 File "/usr/lib/python3/dist-packages/scap/cli.py", line 354, in run
17 exit_status = app.main(app.extra_arguments)
18 File "/usr/lib/python3/dist-packages/scap/deploy.py", line 157, in main
19 getattr(self, stage)()
20 File "/usr/lib/python3/dist-packages/scap/deploy.py", line 223, in config_deploy
21 f.write(tmpl.render())
22 File "/usr/lib/python3/dist-packages/scap/template.py", line 141, in render
23 return self._template.render(template_vars)
24 File "/usr/lib/python3/dist-packages/jinja2/environment.py", line 989, in render
25 return self.environment.handle_exception(exc_info, True)
26 File "/usr/lib/python3/dist-packages/jinja2/environment.py", line 754, in handle_exception
27 reraise(exc_type, exc_value, tb)
28 File "/usr/lib/python3/dist-packages/jinja2/_compat.py", line 37, in reraise
29 raise value.with_traceback(tb)
30 File "<template>", line 148, in top-level template code
31jinja2.exceptions.UndefinedError: 'dict object' has no attribute 'iteritems'
3223:01:19 [restbase1016.eqiad.wmnet] deploy-local failed: <UndefinedError> {}
3323:01:19 [deploy1002] ['/usr/bin/scap', 'deploy-local', '-v', '--repo', 'restbase/deploy', '-g', 'canary', 'config_deploy', '--refresh-config'] (ran as deploy-service@restbase1016.eqiad.wmnet) returned [70]: Source basepath: /srv/deployment/restbase/deploy-cache/revs/664a2f8c762ebed9604fdfb04243fe4763da3c60/.git/config-files
34Rendering config_file: /srv/deployment/restbase/deploy-cache/revs/664a2f8c762ebed9604fdfb04243fe4763da3c60/.git/config-files/etc/restbase/config.yaml
35Unhandled error:
36deploy-local failed: <UndefinedError> {}
37
3823:01:19 [deploy1002] 1 targets had deploy errors
3923:01:19 [deploy1002] 1 targets failed
4023:01:19 [deploy1002] 1 of 2 canary1 targets failed, exceeding limit
4123:02:07 [deploy1002] Finished deploy [restbase/deploy@664a2f8]: (no justification provided) (duration: 00m 50s)
4223:02:07 [deploy1002] Finished deploy [restbase/deploy@664a2f8] (duration: 00m 50s)

Change 736802 had a related patch set uploaded (by Ahmon Dancy; author: Ahmon Dancy):

[mediawiki/services/restbase/deploy@master] conf.yaml.j2: Replace .iteritems() with .items()

https://gerrit.wikimedia.org/r/736802

Change 736802 merged by Ppchelko:

[mediawiki/services/restbase/deploy@master] conf.yaml.j2: Replace .iteritems() with .items()

https://gerrit.wikimedia.org/r/736802

Pchelolo claimed this task.

Success! Thank you @dancy