Runs of sre.gitlab.upgrade fail all the time when unpausing the runners.
Reduce tries from 20 to 1 in DRY-RUN mode Exception raised while executing cookbook sre.gitlab.upgrade: Traceback (most recent call last): File "/usr/lib/python3/dist-packages/gitlab/exceptions.py", line 279, in wrapped_f return f(*args, **kwargs) File "/usr/lib/python3/dist-packages/gitlab/mixins.py", line 296, in update return http_method(path, post_data=new_data, files=files, **kwargs) File "/usr/lib/python3/dist-packages/gitlab/__init__.py", line 713, in http_put result = self.http_request( File "/usr/lib/python3/dist-packages/gitlab/__init__.py", line 565, in http_request raise GitlabHttpError( gitlab.exceptions.GitlabHttpError: 502: GitLab is not responding The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/usr/lib/python3/dist-packages/spicerack/_menu.py", line 212, in run raw_ret = runner.run() File "/srv/deployment/spicerack/cookbooks/sre/gitlab/upgrade.py", line 122, in run unpause_runners(paused_runners, dry_run=self.spicerack.dry_run) File "/usr/lib/python3/dist-packages/wmflib/decorators.py", line 210, in wrapper return func(*args, **kwargs) File "/srv/deployment/spicerack/cookbooks/sre/gitlab/__init__.py", line 64, in unpause_runners runner.save() File "/usr/lib/python3/dist-packages/gitlab/mixins.py", line 385, in save server_data = self.manager.update(obj_id, updated_data, **kwargs) File "/usr/lib/python3/dist-packages/gitlab/exceptions.py", line 281, in wrapped_f raise error(e.error_message, e.response_code, e.response_body) from e gitlab.exceptions.GitlabUpdateError: 502: GitLab is not responding END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab2002.wikimedia.org with reason: Install software version upgrade
Somehow the additional logic for handling the dry run flag (implemented somewhere in https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/909257) reduces the number of retries to 1 instead of 20. This fails because the GitLab API needs some time/retries until it is functional again.
The methods pause_runners() and unpause_runners() are actually trying to pause and unpause the runners, despite the message regarding DRY-RUN mode.