Page MenuHomePhabricator

Gerrit switchover process
Open, In Progress, HighPublic

Description

A part of the effort to standardize failover procedures for Collab services.

Details

Related Changes in Gerrit:
SubjectRepoBranchLines +/-
operations/puppetproduction+134 -65
operations/cookbooksmaster+23 -23
operations/puppetproduction+2 -2
operations/dnsmaster+5 -6
operations/puppetproduction+1 -1
operations/puppetproduction+4 -4
operations/puppetproduction+1 -1
operations/puppetproduction+3 -3
operations/puppetproduction+79 -40
operations/cookbooksmaster+2 -0
operations/cookbooksmaster+15 -14
operations/puppetproduction+2 -0
operations/cookbooksmaster+289 -129
operations/dnsmaster+25 -13
operations/puppetproduction+79 -40
operations/cookbooksmaster+8 -48
operations/cookbooksmaster+144 -0
operations/cookbooksmaster+12 -6
operations/cookbooksmaster+7 -43
operations/puppetproduction+25 -0
operations/cookbooksmaster+0 -8
operations/cookbooksmaster+6 -4
operations/cookbooksmaster+12 -7
operations/puppetproduction+2 -2
operations/cookbooksmaster+5 -3
operations/cookbooksmaster+1 -1
operations/puppetproduction+0 -2
operations/dnsmaster+8 -8
operations/puppetproduction+70 -33
operations/cookbooksmaster+4 -4
operations/dnsmaster+8 -8
operations/puppetproduction+70 -33
operations/puppetproduction+70 -33
operations/software/gerritdeploy/wmf/stable-3.10+114 -0
operations/software/gerritdeploy/wmf/stable-3.10+4 -4
operations/software/gerritdeploy/wmf/stable-3.10+5 -3
operations/cookbooksmaster+9 -10
operations/cookbooksmaster+98 -97
operations/dnsmaster+8 -8
operations/puppetproduction+5 -3
operations/cookbooksmaster+5 -3
operations/cookbooksmaster+24 -9
operations/puppetproduction+2 -2
operations/puppetproduction+46 -30
operations/puppetproduction+19 -0
operations/puppetproduction+1 -0
operations/puppetproduction+1 -6
operations/puppetproduction+19 -0
operations/puppetproduction+5 -3
operations/dnsmaster+8 -3
operations/cookbooksmaster+64 -218
operations/puppetproduction+18 -21
operations/puppetproduction+1 -1
operations/puppetproduction+11 -2
operations/software/gerritdeploy/wmf/stable-3.10+3 -0
operations/cookbooksmaster+1 -1
operations/cookbooksmaster+52 -150
operations/cookbooksmaster+304 -0
operations/software/gerritwmf/stable-3.10+4 -0
operations/software/gerritwmf/stable-3.10+4 -0
operations/software/gerritdeploy/wmf/stable-3.10+0 -0
operations/puppetproduction+3 -2
operations/puppetproduction+1 -1
operations/puppetproduction+16 -8
operations/puppetproduction+1 -0
operations/puppetproduction+5 -3
operations/dnsmaster+8 -8
operations/puppetproduction+3 -3
operations/dnsmaster+8 -8
Show related patches Customize query in gerrit

Related Objects

StatusSubtypeAssignedTask
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
ResolvedNone
In ProgressABran-WMF
OpenNone
ResolvedABran-WMF
ResolvedABran-WMF
In ProgressABran-WMF
ResolvedDzahn
ResolvedMatthewVernon
ResolvedLSobanski
ResolvedABran-WMF
OpenABran-WMF
ResolvedLSobanski
Resolvedhashar
ResolvedABran-WMF
Resolvedhashar
OpenABran-WMF
OpenABran-WMF
OpenABran-WMF
OpenNone

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change #1193860 merged by jenkins-bot:

[operations/cookbooks@master] gerrit: fix typo in source path

https://gerrit.wikimedia.org/r/1193860

Change #1194220 had a related patch set uploaded (by Hashar; author: Hashar):

[operations/software/gerrit@deploy/wmf/stable-3.10] Disable motd banner: maintenance window has closed

https://gerrit.wikimedia.org/r/1194220

Change #1194220 merged by jenkins-bot:

[operations/software/gerrit@deploy/wmf/stable-3.10] Disable motd banner: maintenance window has closed

https://gerrit.wikimedia.org/r/1194220

Mentioned in SAL (#wikimedia-operations) [2025-10-07T15:03:16Z] <hashar@deploy2002> Started deploy [gerrit/gerrit@21d2848]: Disable motd banner: maintenance window has closed - T387833

Mentioned in SAL (#wikimedia-operations) [2025-10-07T15:03:37Z] <hashar@deploy2002> Finished deploy [gerrit/gerrit@21d2848]: Disable motd banner: maintenance window has closed - T387833 (duration: 00m 30s)

Change #1194225 had a related patch set uploaded (by Hashar; author: Hashar):

[operations/software/gerrit@deploy/wmf/stable-3.10] Disable component rather than motd plugin

https://gerrit.wikimedia.org/r/1194225

Change #1194225 merged by jenkins-bot:

[operations/software/gerrit@deploy/wmf/stable-3.10] Disable component rather than motd plugin

https://gerrit.wikimedia.org/r/1194225

Change #1193845 merged by Arnaudb:

[operations/puppet@production] Revert^2 "gerrit: Switchover gerrit1003 → gerrit2003"

https://gerrit.wikimedia.org/r/1193845

Change #1193846 merged by Arnaudb:

[operations/dns@master] Revert^2 "gerrit: switchover from gerrit1003 to gerrit2003"

https://gerrit.wikimedia.org/r/1193846

Change #1194932 had a related patch set uploaded (by Arnaudb; author: Arnaudb):

[operations/dns@master] Revert^4 "gerrit: switchover from gerrit1003 to gerrit2003"

https://gerrit.wikimedia.org/r/1194932

Change #1194931 had a related patch set uploaded (by Arnaudb; author: Arnaudb):

[operations/puppet@production] Revert^4 "gerrit: Switchover gerrit1003 → gerrit2003"

https://gerrit.wikimedia.org/r/1194931

Change #1194949 had a related patch set uploaded (by Arnaudb; author: Arnaudb):

[operations/cookbooks@master] gerrit: local backup on source server only

https://gerrit.wikimedia.org/r/1194949

things are looking better now:

arnaudb@gerrit2003:git $ fd | wc -l
236218
arnaudb@gerrit2003:git $ pwd
/srv/gerrit/git
arnaudb@gerrit2003:git $ ls -l /srv/backup/
total 0

vs the previous situation:

gerrit 2003$ find /srv/gerrit/git |wc -l
5037253

That is 5 millions files.

[...]

 JobId  Level      Files    Bytes   Status   Finished        Name 
====================================================================
656136  Incr    3,745,079    54.98 G  OK       08-Oct-25 13:09 gerrit2003.wikimedia.org-Hourly-Mon-productionEqiad-gerrit-repo-data

Change #1195432 had a related patch set uploaded (by Arnaudb; author: Arnaudb):

[operations/puppet@production] gerrit: re-enable backups on gerrit2003

https://gerrit.wikimedia.org/r/1195432

Change #1195437 had a related patch set uploaded (by Arnaudb; author: Arnaudb):

[operations/cookbooks@master] gerrit: add dry run rsync

https://gerrit.wikimedia.org/r/1195437

Change #1194949 merged by jenkins-bot:

[operations/cookbooks@master] gerrit: local backup on source server only

https://gerrit.wikimedia.org/r/1194949

Change #1194932 merged by Arnaudb:

[operations/dns@master] Revert^4 "gerrit: switchover from gerrit1003 to gerrit2003"

https://gerrit.wikimedia.org/r/1194932

Change #1194931 merged by Arnaudb:

[operations/puppet@production] Revert^4 "gerrit: Switchover gerrit1003 → gerrit2003"

https://gerrit.wikimedia.org/r/1194931

Change #1196051 had a related patch set uploaded (by Arnaudb; author: Arnaudb):

[operations/cookbooks@master] gerrit: typo fix in post_sync_validation

https://gerrit.wikimedia.org/r/1196051

ABran-WMF closed subtask Restricted Task as Resolved.Oct 14 2025, 12:43 PM

Change #1196227 had a related patch set uploaded (by Arnaudb; author: Arnaudb):

[operations/cookbooks@master] gerrit: ask the operator to merge puppet earlier

https://gerrit.wikimedia.org/r/1196227

Change #1195432 merged by Dzahn:

[operations/puppet@production] gerrit: re-enable backups on gerrit2003

https://gerrit.wikimedia.org/r/1195432

Change #1196051 merged by jenkins-bot:

[operations/cookbooks@master] gerrit: typo fix in post_sync_validation

https://gerrit.wikimedia.org/r/1196051

Change #1196629 had a related patch set uploaded (by Arnaudb; author: Arnaudb):

[operations/puppet@production] gerrit: disable gerrit service to enable backups

https://gerrit.wikimedia.org/r/1196629

Change #1196227 merged by jenkins-bot:

[operations/cookbooks@master] gerrit: ask the operator to merge puppet earlier

https://gerrit.wikimedia.org/r/1196227

Change #1196684 had a related patch set uploaded (by Arnaudb; author: Arnaudb):

[operations/cookbooks@master] gerrit: rsync and chown fixes

https://gerrit.wikimedia.org/r/1196684

Change #1196694 had a related patch set uploaded (by Arnaudb; author: Arnaudb):

[operations/cookbooks@master] gerrit: stop puppet across all instances

https://gerrit.wikimedia.org/r/1196694

Change #1196695 had a related patch set uploaded (by Arnaudb; author: Arnaudb):

[operations/cookbooks@master] gerrit: stop stopping gerrit.service

https://gerrit.wikimedia.org/r/1196695

Change #1196629 merged by Dzahn:

[operations/puppet@production] gerrit: disable gerrit service to enable backups

https://gerrit.wikimedia.org/r/1196629

Here are the notes / commands from a gerrit failover in the past:

https://phabricator.wikimedia.org/P47782

Here is how we did the DNS change without having to merge while Gerrit is down:

  1. merge DNS change that removes gerrit-new and switches IP of gerrit.wikimedia.org - in web UI of gerrit(-old)
  2. run authdns-update on ns0.wikimedia.org, see the diff but do NOT commit yet
  3. disable puppet, stop gerrit, do the rsync, run chmod -R ...
  4. say "yes" to authdns-update and actually merge DNS change that removes gerrit-new and switches IP of gerrit.wikimedia.org

Change #1196792 had a related patch set uploaded (by Arnaudb; author: Arnaudb):

[operations/puppet@production] gerrit: unmask service & disable backup temporarily

https://gerrit.wikimedia.org/r/1196792

Here is how we did the DNS change without having to merge while Gerrit is down:

Thanks for the dig! I tweaked the process and the cookbook to be closer to this, the puppet merge timing was inconsistent with that approach.

Change #1196684 merged by jenkins-bot:

[operations/cookbooks@master] gerrit: rsync and chown fixes

https://gerrit.wikimedia.org/r/1196684

Change #1196694 merged by jenkins-bot:

[operations/cookbooks@master] gerrit: stop puppet across all instances

https://gerrit.wikimedia.org/r/1196694

Change #1196695 merged by jenkins-bot:

[operations/cookbooks@master] gerrit: stop stopping gerrit.service

https://gerrit.wikimedia.org/r/1196695

Change #1193599 abandoned by Arnaudb:

[operations/cookbooks@master] gerrit: remove localbackup logic from failover

Reason:

obsolete, will workaround the merge conflict with another change

https://gerrit.wikimedia.org/r/1193599

Change #1210386 had a related patch set uploaded (by Arnaudb; author: Arnaudb):

[operations/cookbooks@master] gerrit: remove localbackup logic from failover

https://gerrit.wikimedia.org/r/1210386

Change #1195437 merged by jenkins-bot:

[operations/cookbooks@master] gerrit: add dry run rsync

https://gerrit.wikimedia.org/r/1195437

Change #1193590 merged by jenkins-bot:

[operations/cookbooks@master] gerrit: add a local backup cookbook

https://gerrit.wikimedia.org/r/1193590

Change #1210560 had a related patch set uploaded (by Hashar; author: Hashar):

[operations/dns@master] gerrit: add a layer of CNAME to ease switch overs

https://gerrit.wikimedia.org/r/1210560

Change #1210386 merged by jenkins-bot:

[operations/cookbooks@master] gerrit: remove localbackup logic from failover

https://gerrit.wikimedia.org/r/1210386

Change #1211548 had a related patch set uploaded (by Arnaudb; author: Arnaudb):

[operations/puppet@production] gerrit: Switchover gerrit1003 → gerrit2003

https://gerrit.wikimedia.org/r/1211548

Change #1211548 abandoned by Arnaudb:

[operations/puppet@production] gerrit: Switchover gerrit1003 → gerrit2003

Reason:

forgot a step

https://gerrit.wikimedia.org/r/1211548

Change #1211549 had a related patch set uploaded (by Arnaudb; author: Arnaudb):

[operations/puppet@production] gerrit: Switchover gerrit1003 → gerrit2003

https://gerrit.wikimedia.org/r/1211549

Change #1211551 had a related patch set uploaded (by Arnaudb; author: Arnaudb):

[operations/puppet@production] gerrit: re-enable backups on gerrit2003

https://gerrit.wikimedia.org/r/1211551

Change #1214466 had a related patch set uploaded (by Arnaudb; author: Arnaudb):

[operations/cookbooks@master] gerrit: rsync logic extraction from failover

https://gerrit.wikimedia.org/r/1214466

Change #1210560 abandoned by Hashar:

[operations/dns@master] gerrit: add a layer of CNAME to ease switch overs

https://gerrit.wikimedia.org/r/1210560

Change #1214466 merged by jenkins-bot:

[operations/cookbooks@master] gerrit: rsync logic extraction from failover

https://gerrit.wikimedia.org/r/1214466

Change #1216583 had a related patch set uploaded (by Arnaudb; author: Arnaudb):

[operations/puppet@production] gerrit: disable backups temporarily

https://gerrit.wikimedia.org/r/1216583

Change #1216583 merged by Arnaudb:

[operations/puppet@production] gerrit: disable backups temporarily

https://gerrit.wikimedia.org/r/1216583

Change #1216586 had a related patch set uploaded (by Arnaudb; author: Arnaudb):

[operations/cookbooks@master] gerrit: testing sync

https://gerrit.wikimedia.org/r/1216586

Change #1216586 abandoned by Arnaudb:

[operations/cookbooks@master] gerrit: testing sync

Reason:

testing over

https://gerrit.wikimedia.org/r/1216586

Change #1216592 had a related patch set uploaded (by Arnaudb; author: Arnaudb):

[operations/cookbooks@master] gerrit: add a confirmation prompt on rsync

https://gerrit.wikimedia.org/r/1216592

Change #1216592 merged by jenkins-bot:

[operations/cookbooks@master] gerrit: add a confirmation prompt on rsync

https://gerrit.wikimedia.org/r/1216592

Change #1217133 had a related patch set uploaded (by Arnaudb; author: Arnaudb):

[operations/puppet@production] gerrit: Switchover gerrit1003 → gerrit2003

https://gerrit.wikimedia.org/r/1217133

Change #1211549 abandoned by Arnaudb:

[operations/puppet@production] gerrit: Switchover gerrit1003 → gerrit2003

Reason:

1217133

https://gerrit.wikimedia.org/r/1211549

Change #1196792 abandoned by Arnaudb:

[operations/puppet@production] gerrit: unmask service

Reason:

1217133

https://gerrit.wikimedia.org/r/1196792

Change #1211551 abandoned by Arnaudb:

[operations/puppet@production] gerrit: re-enable backups on gerrit2003

Reason:

1217133

https://gerrit.wikimedia.org/r/1211551

Change #1217134 had a related patch set uploaded (by Arnaudb; author: Arnaudb):

[operations/puppet@production] gerrit: re-enable backups and monitoring on gerrit2003

https://gerrit.wikimedia.org/r/1217134

ABran-WMF renamed this task from Gerrit failover process to Gerrit switchover process.Dec 16 2025, 6:17 AM
ABran-WMF updated the task description. (Show Details)

Change #1237450 had a related patch set uploaded (by Arnaudb; author: Arnaudb):

[operations/puppet@production] gerrit: temporarily disable replication on gerrit2003

https://gerrit.wikimedia.org/r/1237450

Change #1237450 merged by Arnaudb:

[operations/puppet@production] gerrit: temporarily disable replication on gerrit2003

https://gerrit.wikimedia.org/r/1237450

Regarding the DNS change, @Dzahn desribed the new situation here: T412779#11581372

The new central DNS CNAME which is used by the tcp-proxy and trafficserver is gerrit.discovery.wmnet. This entry currently points to:

gerrit                300 IN CNAME gerrit1003.wikimedia.org.

So for the switchover, the CNAME should be updated here.

The replica/spare hosts will not be behind the CDN most likely next week. So the DNS switch for those has to happen here.

Change #1238686 had a related patch set uploaded (by Arnaudb; author: Arnaudb):

[operations/puppet@production] gerrit: unmask gerrit.service on gerrit2003

https://gerrit.wikimedia.org/r/1238686

Change #1238686 merged by Arnaudb:

[operations/puppet@production] gerrit: unmask gerrit.service on gerrit2003

https://gerrit.wikimedia.org/r/1238686

Change #1238708 had a related patch set uploaded (by Arnaudb; author: Arnaudb):

[operations/dns@master] gerrit: switchover from gerrit1003 to gerrit2003

https://gerrit.wikimedia.org/r/1238708

ABran-WMF updated the task description. (Show Details)

Change #1238376 had a related patch set uploaded (by Arnaudb; author: Arnaudb):

[operations/cookbooks@master] gerrit: update switchover related cookbooks

https://gerrit.wikimedia.org/r/1238376

Change #1238376 merged by jenkins-bot:

[operations/cookbooks@master] gerrit: update switchover related cookbooks

https://gerrit.wikimedia.org/r/1238376