Page MenuHomePhabricator

Replication to GitHub seems to have stalled
Closed, ResolvedPublic

Description

https://github.com/wikimedia/mediawiki/commits/master/ is a few commits behind https://gerrit.wikimedia.org/g/mediawiki/core

Last known update is about ~11 hours before this task has been filed

Event Timeline

Reedy triaged this task as High priority.Feb 9 2026, 6:15 PM

This also breaks our gerrit -> GitHub -> Packagist integrations, of course.

  • Last github mention in gerrit1003:/var/log/gerrit/replication_log was 2026-02-09 13:51:05 (4+ hours ago).
  • Last git@github.com:wikimedia/mediawiki-core mention was 2026-02-09 09:21:23.
  • Last gerrit2002 mention in the same log was functionally as I opened the file.

The server doesn't seem to know that it is behind:

bd808@mbp03:~/projects/wmf$ ssh -i ~/.ssh/id_ed25519 -o IdentityAgent=$SSH_AUTH_SOCK -p 29418 gerrit1003.wikimedia.org replication list --detail
Remote: github
Url: git@github.com:wikimedia/${name}
AuthGroup: mediawiki-replication
Project: ^(?:(?!apps\/).)*$
In Flight: 0
Pending: 0

Remote: replica-a-codfw
Url: gerrit2@gerrit2002.wikimedia.org:/srv/gerrit/git/${name}.git
In Flight: 2
  [9a6e592b] push gerrit2@gerrit2002.wikimedia.org:/srv/gerrit/git/mediawiki/extensions/CheckUser.git [refs/changes/40/1236340/meta]
  [5a322173] push gerrit2@gerrit2002.wikimedia.org:/srv/gerrit/git/All-Users.git [refs/draft-comments/34/1219634/10074]
Pending: 0

Let's see if I can kick it anyway (inspired by T100409#1315859):

bd808@mbp03:~/projects/wmf$ ssh -i /Users/bd808/.ssh/id_ed25519 -o IdentityAgent=$SSH_AUTH_SOCK -p 29418 gerrit1003.wikimedia.org replication start mediawiki/core --wait
Replicate mediawiki/core refs ..all.. to gerrit2002.wikimedia.org, Succeeded! (OK)
Replication of mediawiki/core ref ..all.. completed to 1 nodes,
----------------------------------------------
Replication completed successfully!

Hmmm.... no github.com line there.

GitHub appears to have quite a few ongoing issues.

2026-02-09-12:16:48.png (760×1 px, 69 KB)

We have a dashboard at https://grafana.wikimedia.org/d/RFLS1GsWk/replication-upstream though it is hard to read and does not make the issue stand out.

What I didn't caught earlier is that the replication configuration has been changed by @ABran-WMF at https://gerrit.wikimedia.org/r/c/operations/puppet/+/1237450 . He has removed the replication to the spare gerrit2003.

The Gerrit replication has an issue somewhere, which causes it too loose track of replications when the config is changed. Even when a given replica config has not been altered :-(

Mentioned in SAL (#wikimedia-operations) [2026-02-09T19:34:15Z] <hashar> restarting Gerrit to fix broken replication to GitHub # T416912

hashar claimed this task.

@bd808 mentioned it to me over IRC. I have encountered the bug previously but honestly I never went to dig into the root cause or check whether upstream has a bug filed for it. Maybe that got fixed in a more recent version of Gerrit.

Meanwhile this one has been fixed by restarting Gerrit.

Change #1238315 had a related patch set uploaded (by Arnaudb; author: Arnaudb):

[operations/puppet@production] gerrit: add mtail monitoring on replication

https://gerrit.wikimedia.org/r/1238315