In T387833: Gerrit switchover process we created a switchover process as highlighted in T411583: Gerrit backups are growing.
The parent task has been renamed to match its scope and this will help us track and review the failover process.
Description
Description
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Open | None | T407557 OpenSSH 10.1+ warns that Wikimedia SSH does not use post-quantum key exchange algorithm | |||
| Open | None | T407844 Gerrit ssh daemon does not offer post-quantum kex leading to a warning with OpenSSH 10 | |||
| Open | None | T392448 Upgrade to Gerrit 3.12 | |||
| Open | None | T379714 Upgrade to Gerrit 3.11 | |||
| Open | None | T392465 Switch Gerrit from Java 17 to Java 21 | |||
| Open | None | T384595 Upgrade Collab hosts to Bookworm | |||
| Open | None | T392464 Upgrade Gerrit hosts from Bullseye to Bookworm | |||
| Open | None | T387831 Standardize failover procedures for Collab services | |||
| Resolved | None | T393239 ProbeDown | |||
| Resolved | ABran-WMF | T387833 Gerrit switchover process | |||
| Open | ABran-WMF | T412779 Gerrit failover process |
Event Timeline
Comment Actions
Now that Gerrit is (more or less) behind the CDN (T411895), there will be this step to actually change backends in the future.
common/profile/trafficserver/backend.yaml configures:
target: http://gerrit.wikimedia.org replacement: https://gerrit.discovery.wmnet
and the DNS repo `dns/templates/wmnet'; in the DISCOVERY section; configures:
gerrit 300 IN CNAME gerrit1003.wikimedia.org.
So changing that DNS discovery record is the actual switch.
But also gerrit-replica.wikimedia.org has its own record.
Comment Actions
neat, that will simplify things!
I've added an item with a draft document to our team meeting so we can address missing bits to make handling a gerrit failover manageable in an emergency