16:56:19 <mutante> raise AlertmanagerError(f"Unable to {method.upper()} to any Alertmanager: {self._alertmanager_urls}", response)
this taks is to track the debugging of that error
16:56:19 <mutante> raise AlertmanagerError(f"Unable to {method.upper()} to any Alertmanager: {self._alertmanager_urls}", response)
this taks is to track the debugging of that error
| Subject | Repo | Branch | Lines +/- | |
|---|---|---|---|---|
| gerrit: alerting downtime update | operations/cookbooks | master | +29 -19 |
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Open | None | T407557 OpenSSH 10.1+ warns that Wikimedia SSH does not use post-quantum key exchange algorithm | |||
| Open | None | T407844 Gerrit ssh daemon does not offer post-quantum kex leading to a warning with OpenSSH 10 | |||
| Restricted Task | |||||
| Open | None | T392448 Upgrade to Gerrit 3.12 | |||
| Open | None | T379714 Upgrade to Gerrit 3.11 | |||
| Open | None | T392465 Switch Gerrit from Java 17 to Java 21 | |||
| Open | None | T384595 Upgrade Collab hosts to Bookworm | |||
| Resolved | ABran-WMF | T392464 Upgrade Gerrit hosts from Bullseye to Bookworm | |||
| Open | None | T387831 Standardize failover procedures for Collab services | |||
| Resolved | None | T393239 ProbeDown | |||
| Resolved | ABran-WMF | T387833 Gerrit switchover process | |||
| Declined | None | T257383 Update wikibugs's Gerrit ssh host keys | |||
| Stalled | None | T257382 Update libup for Gerrit's new ssh host keys | |||
| Restricted Task | |||||
| Resolved | ABran-WMF | T417247 Reimage gerrit2002 | |||
| Resolved | ABran-WMF | T418264 Fix gerrit-restart cookbook |
Change #1239003 had a related patch set uploaded (by Arnaudb; author: Arnaudb):
[operations/cookbooks@master] gerrit: alerting downtime update
Change #1239003 merged by jenkins-bot:
[operations/cookbooks@master] gerrit: alerting downtime update
Things are going as expected:
DRY-RUN: Executing cookbook sre.gerrit.restart-gerrit with args: ['--host', 'gerrit2002']
DRY-RUN: Found [('conf1009.eqiad.wmnet', 4001), ('conf1007.eqiad.wmnet', 4001), ('conf1008.eqiad.wmnet', 4001)]
DRY-RUN: New etcd client created for https://conf1009.eqiad.wmnet:4001
DRY-RUN: Retrieved list of machines: ['https://conf1007.eqiad.wmnet:4001', 'https://conf1008.eqiad.wmnet:4001', 'https://conf1009.eqiad.wmnet:4001']
DRY-RUN: Machines cache initialised to ['https://conf1007.eqiad.wmnet:4001', 'https://conf1008.eqiad.wmnet:4001']
DRY-RUN: Acquiring lock for key sre.gerrit.restart-gerrit: {'concurrency': 1, 'created': '2026-03-04 13:24:17.719159', 'owner': 'arnaudb@cumin1003 [4141862]', 'ttl': 900}
DRY-RUN: Reduce tries from 27 to 1 in DRY-RUN mode
DRY-RUN: Issuing read for key /spicerack/locks/cookbooks/sre.gerrit.restart-gerrit with args {'timeout': 60}
DRY-RUN: Skipping lock acquire/release in DRY-RUN mode
DRY-RUN: Acquired lock for key /spicerack/locks/cookbooks/sre.gerrit.restart-gerrit: {'concurrency': 1, 'created': '2026-03-04 13:24:17.723665', 'owner': 'arnaudb@cumin1003 [4141862]', 'ttl': 900}
DRY-RUN: START - Cookbook sre.gerrit.restart-gerrit Restarting Gerrit on gerrit2002
DRY-RUN: Setting downtime for gerrit2002
DRY-RUN: Resolved CNAME record for icinga.wikimedia.org: icinga.wikimedia.org. 249 IN CNAME alert1002.wikimedia.org.
DRY-RUN: Executing commands ["grep -P '\\s*command_file\\s*=.+' /etc/icinga/icinga.cfg"] on 1 hosts: alert1002.wikimedia.org
DRY-RUN: Executing commands [cumin.transports.Command('/usr/local/bin/icinga-status -j "gerrit2002"', ok_codes=[])] on 1 hosts: alert1002.wikimedia.org
DRY-RUN: Scheduling downtime on Icinga server alert1002.wikimedia.org for hosts: gerrit2002
DRY-RUN: Executing commands ['bash -c \'echo -n "[1772630659] SCHEDULE_HOST_DOWNTIME;gerrit2002;1772630659;1772645059;1;0;14400;arnaudb@cumin1003;Restarting Gerrit on gerrit2002" > /var/lib/icinga/rw/icinga.cmd \''] on 1 hosts: alert1002.wikimedia.org
DRY-RUN: Executing commands ['bash -c \'echo -n "[1772630659] SCHEDULE_HOST_SVC_DOWNTIME;gerrit2002;1772630659;1772645059;1;0;14400;arnaudb@cumin1003;Restarting Gerrit on gerrit2002" > /var/lib/icinga/rw/icinga.cmd \''] on 1 hosts: alert1002.wikimedia.org
DRY-RUN: Reduce tries from 12 to 1 in DRY-RUN mode
DRY-RUN: Executing commands [cumin.transports.Command('/usr/local/bin/icinga-status -j "gerrit2002"', ok_codes=[])] on 1 hosts: alert1002.wikimedia.org
DRY-RUN: Some hosts are not yet downtimed: ['gerrit2002']
DRY-RUN: Would have called POST http://alertmanager-eqiad.wikimedia.org/api/v2/silences
DRY-RUN: Would have called POST http://alertmanager-eqiad.wikimedia.org/api/v2/silences
==> About to restart Gerrit on gerrit2002. Full downtime active (Host + Alertmanager). Proceed?
Type "go" to proceed or "abort" to interrupt the execution
> go
DRY-RUN: User input is: "go"
DRY-RUN: Restarting gerrit service on gerrit2002
DRY-RUN: Executing commands ['systemctl restart gerrit'] on 1 hosts: gerrit2002.wikimedia.org
DRY-RUN: Skipping monitoring wait because of dry run.
DRY-RUN: Would have called DELETE http://alertmanager-eqiad.wikimedia.org/api/v2/silence/
DRY-RUN: Deleted silence ID
DRY-RUN: Executing commands ['bash -c \'echo -n "[1772630664] DEL_DOWNTIME_BY_HOST_NAME;gerrit2002" > /var/lib/icinga/rw/icinga.cmd \''] on 1 hosts: alert1002.wikimedia.org
DRY-RUN: Would have called DELETE http://alertmanager-eqiad.wikimedia.org/api/v2/silence/
DRY-RUN: Deleted silence ID
DRY-RUN: Gerrit restart completed successfully. Downtimes removed.
DRY-RUN: Releasing lock for key sre.gerrit.restart-gerrit with ID 046af7f1-cbfb-4249-8c32-8b3d1785c95a
DRY-RUN: Issuing read for key /spicerack/locks/cookbooks/sre.gerrit.restart-gerrit with args {'timeout': 60}
DRY-RUN: Lock for key /spicerack/locks/cookbooks/sre.gerrit.restart-gerrit and ID 046af7f1-cbfb-4249-8c32-8b3d1785c95a not found. Unable to release it. Was expired?
DRY-RUN: __COOKBOOK_STATS__:name=sre.gerrit.restart-gerrit,exit_code=0,duration=6.651
DRY-RUN: END (PASS) - Cookbook sre.gerrit.restart-gerrit (exit_code=0) Restarting Gerrit on gerrit2002