Page MenuHomePhabricator

Investigate missing commit from old changes
Open, LowPublic

Description

We have kept copies of Gerrit git repositories as we migrated from an host to another one. Each time I have explicitly requested for those copies to be kept around

The reasons are:

  • some old changes are missing patchsets/commits
  • an offline full reindexing emits errors about missing commits

@hashar theory is some of those missing commits might still be available in the old copies of the git repositories which is why I am keeping them. They otherwise add pressure when transferring data from an host to another.

Using a different host and a copy of the live git repositories, we could run an offline reindexing to collect the errors then act on them.

For the old changes that are missing commits, we would need a script that crawls through refs/changes of each repositories, inspect the changes metadata ( json payload), lookup the pointed out commits then dig into the copies of the git repos to find whether we have those commits. The reindex most probably emits that as a warning so a script might not be needed:

We could do it online and with a trace-id to make it easier to find the logs:

ssh -p 29418 gerrit.wikimedia.org gerrit index start changes --trace --trace-id T388507

Maybe --force is needed.


List of files under /srv/gerrit:

/srv/gerrit/All-Users
/srv/gerrit/All-Users-2020-03-20.git
/srv/gerrit/analytics-wmde-wd-wd_identifiedlandscape.git.2019-10-24
/srv/gerrit/cobalt
/srv/gerrit/codex-php.git-20241010T1254-T375939
/srv/gerrit/codex-php.git-20241011T1222-T375939
/srv/gerrit/data
/srv/gerrit/git
/srv/gerrit/git.2019-10-22
/srv/gerrit/git.2019-10-24
/srv/gerrit/git.2020-06-27.qchris.just-before-3.2-upgrade
/srv/gerrit/plugins
/srv/gerrit/replication
/srv/gerrit/T236443
/srv/gerrit/wikimedia-fundraising-crm.2019-10-24.git

Event Timeline

I started digging a bit around this issue, I've tried to run old Gerrit directories with our current .war file: it was not possible so I had to dig which version was probably running at this moment.
Fortunately, all Gerrit versions are available: https://gerrit-releases.storage.googleapis.com/index.html → I tried to run 2.15.18 which was published in nov. 2019. It requires an old JRE. Fortunately, nvidia-openjdk-8-jre is still available on Debian 12.

At this stage I'll have to dig a bit deeper Wikitech states:

Since the Gerrit v3.2 upgrade in summer 2020, Gerrit no longer uses a conventional, relational database. So if you read somewhere about Gerrit's MySQL (or similar) database or "reviewdb": it is stale information. Instead of a relational database, Gerrit 3 stores the needed data directly in the git repositories (NoteDB). To speed up lookup, it creates indices also known has secondary index. These indices are Lucene backed indices and H2 (flat file database engine from the Java world).

and my last attempt showed MySQL seemed to be a requirement:

[2025-03-17 13:22:52,210] [main] ERROR com.google.gerrit.pgm.Daemon : Unable to start daemon
com.google.inject.ProvisionException: Unable to provision, see the following errors:

1) database.type must be defined

1 error
	at com.google.gerrit.pgm.util.SiteProgram.createDbInjector(SiteProgram.java:168)
	at com.google.gerrit.pgm.Daemon.start(Daemon.java:330)
	at com.google.gerrit.pgm.Daemon.run(Daemon.java:261)
	at com.google.gerrit.pgm.util.AbstractProgram.main(AbstractProgram.java:61)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at com.google.gerrit.launcher.GerritLauncher.invokeProgram(GerritLauncher.java:221)
	at com.google.gerrit.launcher.GerritLauncher.mainImpl(GerritLauncher.java:117)
	at com.google.gerrit.launcher.GerritLauncher.main(GerritLauncher.java:61)
	at Main.main(Main.java:24)

I'll take time to dig further to see if there is a command that allows indexing the code without actually running Gerrit or having a database.