We have kept copies of Gerrit git repositories as we migrated from an host to another one. Each time I have explicitly requested for those copies to be kept around
The reasons are:
- some old changes are missing patchsets/commits
- an offline full reindexing emits errors about missing commits
@hashar theory is some of those missing commits might still be available in the old copies of the git repositories which is why I am keeping them. They otherwise add pressure when transferring data from an host to another.
Using a different host and a copy of the live git repositories, we could run an offline reindexing to collect the errors then act on them.
For the old changes that are missing commits, we would need a script that crawls through refs/changes of each repositories, inspect the changes metadata ( json payload), lookup the pointed out commits then dig into the copies of the git repos to find whether we have those commits. The reindex most probably emits that as a warning so a script might not be needed:
We could do it online and with a trace-id to make it easier to find the logs:
ssh -p 29418 gerrit.wikimedia.org gerrit index start changes --trace --trace-id T388507
Maybe --force is needed.
List of files under /srv/gerrit:
/srv/gerrit/All-Users /srv/gerrit/All-Users-2020-03-20.git /srv/gerrit/analytics-wmde-wd-wd_identifiedlandscape.git.2019-10-24 /srv/gerrit/cobalt /srv/gerrit/codex-php.git-20241010T1254-T375939 /srv/gerrit/codex-php.git-20241011T1222-T375939 /srv/gerrit/data /srv/gerrit/git /srv/gerrit/git.2019-10-22 /srv/gerrit/git.2019-10-24 /srv/gerrit/git.2020-06-27.qchris.just-before-3.2-upgrade /srv/gerrit/plugins /srv/gerrit/replication /srv/gerrit/T236443 /srv/gerrit/wikimedia-fundraising-crm.2019-10-24.git