Page MenuHomePhabricator

Lots of "Skipping change xxx because the corresponding repository was not found" in the logs
Closed, ResolvedPublic

Description

Upstream task https://bugs.chromium.org/p/gerrit/issues/detail?id=11650

I'm seeing lots of "Skipping change xxx because the corresponding repository was not found" at https://gerrit.wikimedia.org/r/monitoring

For example:

com.google.gerrit.server.permissions.PermissionBackendException: project 'maps/tilerator' is unavailable
      at com.google.gerrit.server.permissions.FailedPermissionBackend$FailedChange.test(FailedPermissionBackend.java:170)
      at com.google.gerrit.server.permissions.PermissionBackend$ForChange.test(PermissionBackend.java:383)
      at com.google.gerrit.server.query.change.ChangeIsVisibleToPredicate.match(ChangeIsVisibleToPredicate.java:70)
      at com.google.gerrit.server.query.change.ChangeIsVisibleToPredicate.match(ChangeIsVisibleToPredicate.java:32)
      at com.google.gerrit.index.query.AndSource.match(AndSource.java:147)
      at com.google.gerrit.index.query.AndSource.readImpl(AndSource.java:101)
      at com.google.gerrit.index.query.AndSource.read(AndSource.java:83)
      at com.google.gerrit.index.query.QueryProcessor.query(QueryProcessor.java:238)
      at com.google.gerrit.index.query.QueryProcessor.query(QueryProcessor.java:174)
      at com.google.gerrit.index.query.QueryProcessor.query(QueryProcessor.java:157)
      at com.google.gerrit.server.query.change.OutputStreamQuery.query(OutputStreamQuery.java:201)
      at com.google.gerrit.sshd.commands.Query.run(Query.java:103)
      at com.google.gerrit.sshd.SshCommand$1.run(SshCommand.java:36)
      at com.google.gerrit.sshd.BaseCommand$TaskThunk.run(BaseCommand.java:459)
      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
      at com.google.gerrit.server.git.WorkQueue$Task.run(WorkQueue.java:558)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      at java.lang.Thread.run(Thread.java:748)
Caused by: org.eclipse.jgit.errors.RepositoryNotFoundException: repository not found: /srv/gerrit/git/maps/tilerator
      at com.google.gerrit.server.git.LocalDiskRepositoryManager.openRepository(LocalDiskRepositoryManager.java:157)
      at com.google.gerrit.server.git.LocalDiskRepositoryManager.openRepository(LocalDiskRepositoryManager.java:139)
      at com.google.gerrit.server.project.ProjectCacheImpl$Loader.load(ProjectCacheImpl.java:310)
      at com.google.gerrit.server.project.ProjectCacheImpl$Loader.load(ProjectCacheImpl.java:294)
      at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3524)
      at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2273)
      at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2156)
      at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2046)
      at com.google.common.cache.LocalCache.get(LocalCache.java:3943)
      at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3967)
      at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4952)
      at com.google.gerrit.server.project.ProjectCacheImpl.strictCheckedGet(ProjectCacheImpl.java:161)
      at com.google.gerrit.server.project.ProjectCacheImpl.checkedGet(ProjectCacheImpl.java:157)
      at com.google.gerrit.server.project.DefaultPermissionBackend$WithUserImpl.project(DefaultPermissionBackend.java:73)
      at com.google.gerrit.server.permissions.PermissionBackend$WithUser.ref(PermissionBackend.java:145)
      at com.google.gerrit.server.permissions.PermissionBackend$WithUser.indexedChange(PermissionBackend.java:168)
      at com.google.gerrit.server.query.change.ChangeIsVisibleToPredicate.match(ChangeIsVisibleToPredicate.java:68)
      ... 19 more

and many more.

This is likley because of the index (which means the delete-project plugin didn't reindex after?)

Event Timeline

It seems all of them have the ssh gerrit query in their stacktrace. On the server we have the actual query logged. Example:

[2019-09-26 00:00:37,722]
[SSH gerrit query limit:100 (status:open OR status:closed)
AND NOT (502978 OR 478688 OR 266726 OR 266725 OR 256050 OR 235852 OR 143269 OR 102114 OR 101842 OR 99101 
OR 58858 OR 32902 OR 13930 OR 225009 OR 225033 OR 225080 OR 225087 OR 225099) --all-approvals --comments --format JSON --start 145500 (XXXXXXX)]
WARN  com.google.gerrit.server.query.change.Chang
eIsVisibleToPredicate : Skipping change 390021 because the corresponding repository was not found
com.google.gerrit.server.permissions.PermissionBackendException: project 'maps/tilerator' is unavailable

Which somehow trigger all the madness. That comes from a bot owl which is indexing Gerrit for https://wikimedia.biterg.io/ , it does a request per minute in batches of 100 changes (using gerrit query --start=100 ... --start=200 ... --start=300 etc).

Even though the repository has been deleted, the old changes might still be in the database?

hashar triaged this task as Medium priority.Sep 26 2019, 9:59 PM

Eg: Skipping change 342073 because the corresponding repository was not found

It is in the database:

gerrit> SELECT * FROM changes WHERE change_id='342073';
change_key | created_on | last_updated_on | owner_account_id | dest_project_name | dest_branch_name | status | current_patch_set_id | subject | topic | row_version | change_id | original_subject | submission_id | note_db_state | assignee | is_private | work_in_progress | review_started | revert_of
------------------------------------------+-----------------------+-----------------------+------------------+-------------------+-------------------+--------+----------------------+---------+-------+-------------+-----------+------------------+-------------------------------+---------------+----------+------------+------------------+----------------+----------
Ifde25c372aae9cf64fdaf49c3f1be5c1513c662b | 2017-03-09 19:20:01.0 | 2017-03-28 17:52:59.0 | 43 | maps/tilerator | refs/heads/master | M | 1 | Test | NULL | 17 | 342073 | Test | 342073-1489097314042-82d64f41 | N | NULL | N | N | Y | NULL
(1 row; 2 ms)

So even though the repository has been deleted from disk, the changes are around. I guess when gerrit query crawls through the changes and to check whether it is visible to the user has to open the repository to retrieve its configuration file. The repo does not exist, and that bubbles up until that message.

I don't know what should be the behavior of Gerrit when a repository is deleted. I would guess the change should probably be deleted from the database?

@hashar changes are no longer in the db :) (it's all NoteDB).

So effectively that table is from when we migrated to NoteDB in 2018/June

This repo has been deleted in T228483. That ticket explicitly called for "Delete (rather than archive)". Best to comment there too that there are these side-effects of deleting repos and archiving might be better.

This repo has been deleted in T228483. That ticket explicitly called for "Delete (rather than archive)". Best to comment there too that there are these side-effects of deleting repos and archiving might be better.

No, finding breakages like this is a good thing.

I wonder should we do a full online reindex https://gerrit-review.googlesource.com/Documentation/cmd-index-start.html

ssh -p 29418 <user>@gerrit.wikimedia.org gerrit index start changes --force

Mentioned in SAL (#wikimedia-releng) [2019-09-27T21:01:38Z] <hashar> T233989 I have failed to run (must be run offline): /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -jar /var/lib/gerrit2/review_site/bin/gerrit.war reindex --list

Mentioned in SAL (#wikimedia-operations) [2019-10-01T07:00:25Z] <hashar> gerrit: forcing reindex of changes # T233989

I have disabled the bot account since I suspect it causes gc/memory pressure on Gerrit: T234328

07:00 <hashar> gerrit: forcing reindex of changes # T233989

My hope was to have the complete reindexing of changes to delete the missing repositories which trigger this task warnings. However we had to restart Gerrit a few hours later to tweak the JVM garbage collector:

17:52 <thcipriani> gerrit restart for new config changes incoming

So it is unlikely the reindex has completed and we should start it again.

I could not find a way to inspect the index :\ I want to check whether the no more existing maps/kartotherian repo was still listed in it.

The error is triggered when doing a query which is logged as well:

[2019-10-01 13:42:27,917] [SSH gerrit query limit:10 (status:open OR status:closed) AND NOT (502978 OR 478688 OR 266726 OR 266725 OR 256050 OR 235852 OR 143269 OR 102114 OR 101842 OR 99101 OR 58858 OR 32902 OR 13930 OR 225009 OR 225033 OR 225080 OR 225087 OR 225099) --all-approvals --comments --format JSON --start 66030 (owl)]
WARN  com.google.gerrit.server.query.change.ChangeIsVisibleToPredicate : Skipping change 342073 because the corresponding repository was not found
com.google.gerrit.server.permissions.PermissionBackendException: project 'maps/tilerator' is unavailable

That bot has since been deactivated and thus the warning no more show up. If I run the query manually, on the server side I get:

com.google.gerrit.server.permissions.PermissionBackendException: project 'analytics/wmde/Wiktionary/WD_percentUsageDashboard' is unavailable
com.google.gerrit.server.permissions.PermissionBackendException: project 'operations/debs/shinken' is unavailable
com.google.gerrit.server.permissions.PermissionBackendException: project 'analytics/wmde/Wiktionary/WD_percentUsageDashboard' is unavailable
com.google.gerrit.server.permissions.PermissionBackendException: project 'maps/kartotherian' is unavailable
com.google.gerrit.server.permissions.PermissionBackendException: project 'maps/kartotherian' is unavailable
com.google.gerrit.server.permissions.PermissionBackendException: project 'operations/debs/pkg-php/php-ast' is unavailable
com.google.gerrit.server.permissions.PermissionBackendException: project 'operations/debs/pkg-php/php-ast' is unavailable
com.google.gerrit.server.permissions.PermissionBackendException: project 'maps/kartotherian' is unavailable
com.google.gerrit.server.permissions.PermissionBackendException: project 'maps/tilerator' is unavailable
com.google.gerrit.server.permissions.PermissionBackendException: project 'maps/kartotherian' is unavailable
com.google.gerrit.server.permissions.PermissionBackendException: project 'maps/tilerator' is unavailable
com.google.gerrit.server.permissions.PermissionBackendException: project 'maps/tilerator' is unavailable
com.google.gerrit.server.permissions.PermissionBackendException: project 'maps/kartotherian' is unavailable
com.google.gerrit.server.permissions.PermissionBackendException: project 'maps/kartotherian' is unavailable
com.google.gerrit.server.permissions.PermissionBackendException: project 'maps/kartotherian' is unavailable
com.google.gerrit.server.permissions.PermissionBackendException: project 'maps/tilerator' is unavailable

So the indexing has not completed :\

The error is triggered when doing a query which is logged as well:

[2019-10-01 13:42:27,917] [SSH gerrit query limit:10 (status:open OR status:closed) AND NOT (502978 OR 478688 OR 266726 OR 266725 OR 256050 OR 235852 OR 143269 OR 102114 OR 101842 OR 99101 OR 58858 OR 32902 OR 13930 OR 225009 OR 225033 OR 225080 OR 225087 OR 225099) --all-approvals --comments --format JSON --start 66030 (owl)]
WARN  com.google.gerrit.server.query.change.ChangeIsVisibleToPredicate : Skipping change 342073 because the corresponding repository was not found
com.google.gerrit.server.permissions.PermissionBackendException: project 'maps/tilerator' is unavailable

That bot has since been deactivated and thus the warning no more show up. If I run the query manually, on the server side I get:

com.google.gerrit.server.permissions.PermissionBackendException: project 'analytics/wmde/Wiktionary/WD_percentUsageDashboard' is unavailable
com.google.gerrit.server.permissions.PermissionBackendException: project 'operations/debs/shinken' is unavailable
com.google.gerrit.server.permissions.PermissionBackendException: project 'analytics/wmde/Wiktionary/WD_percentUsageDashboard' is unavailable
com.google.gerrit.server.permissions.PermissionBackendException: project 'maps/kartotherian' is unavailable
com.google.gerrit.server.permissions.PermissionBackendException: project 'maps/kartotherian' is unavailable
com.google.gerrit.server.permissions.PermissionBackendException: project 'operations/debs/pkg-php/php-ast' is unavailable
com.google.gerrit.server.permissions.PermissionBackendException: project 'operations/debs/pkg-php/php-ast' is unavailable
com.google.gerrit.server.permissions.PermissionBackendException: project 'maps/kartotherian' is unavailable
com.google.gerrit.server.permissions.PermissionBackendException: project 'maps/tilerator' is unavailable
com.google.gerrit.server.permissions.PermissionBackendException: project 'maps/kartotherian' is unavailable
com.google.gerrit.server.permissions.PermissionBackendException: project 'maps/tilerator' is unavailable
com.google.gerrit.server.permissions.PermissionBackendException: project 'maps/tilerator' is unavailable
com.google.gerrit.server.permissions.PermissionBackendException: project 'maps/kartotherian' is unavailable
com.google.gerrit.server.permissions.PermissionBackendException: project 'maps/kartotherian' is unavailable
com.google.gerrit.server.permissions.PermissionBackendException: project 'maps/kartotherian' is unavailable
com.google.gerrit.server.permissions.PermissionBackendException: project 'maps/tilerator' is unavailable

So the indexing has not completed :\

DIdn't we reindex changes and not projects?

I tried creating a couple of missing repositories maps/kartotherian and maps/tilerator. The warning then disappear, but when I delete the repositories again, the message reappear.

So I tried to index both projects:

gerrit index project maps/kartotherian
gerrit index project maps/tilerator

(I think that did not delete the faulty changes).

Then find the faulty changes and independently index them:

$ grep -P -B1 "'maps/(kartotherian|tilerator)' is unavailable" /var/log/gerrit/gerrit.log|grep -P -o 'Skipping change \d+'|sort|uniq
Skipping change 322368
Skipping change 337200
Skipping change 342073
Skipping change 345190
Skipping change 349265
Skipping change 351224
Skipping change 390020
Skipping change 390021
Skipping change 390023
Skipping change 393972
Skipping change 421024
cut -d\ -f3xargs:
$ gerrit index changes 322368 337200 342073 345190 349265 351224 390020 390021 390023 393972 421024
warning: "322368" no such change
warning: "337200" no such change
warning: "342073" no such change
warning: "345190" no such change
warning: "349265" no such change
warning: "351224" no such change
warning: "390020" no such change
warning: "390021" no such change
warning: "390023" no such change
warning: "393972" no such change
warning: "421024" no such change

:-\

Deleted both repository, ran the gerrit query and the warning are still showing.

Conclusion:

  • brut force a potential workaround hoping to trigger a cleanup does not work. One should read the code instead.
  • a full reindex might do it

Mentioned in SAL (#wikimedia-releng) [2019-10-07T08:32:50Z] <hashar> gerrit: gerrit index start changes # T233989

Mentioned in SAL (#wikimedia-releng) [2019-10-07T08:33:42Z] <hashar> gerrit: gerrit index start changes --force # T233989

I think it just crawls through each project but would not magically unindex changes from projects that have been deleted :-\

Reindexing done:

[2019-10-07 08:33:39,129] [Reindex changes v48-v48]
INFO  com.google.gerrit.server.index.OnlineReindexer : Starting online reindex of changes from schema version 48 to 48
[2019-10-07 12:02:23,034] [Reindex changes v48-v48]
INFO  com.google.gerrit.server.index.OnlineReindexer : Reindex changes to version 48 complete
[2019-10-07 12:02:23,034] [Reindex changes v48-v48]
INFO  com.google.gerrit.server.index.OnlineReindexer : Using changes schema version 48

I have browsed a non existing change and noticed a stacktrace related to change notes, so I have flushed some caches (change_notes, projects, permission_sort, changeid_project). The warning still show up :-\

So I guess we need to generate a fresh new index.

Eventually that surfaced on the Gerrit mailing list: https://groups.google.com/forum/#!msg/repo-discuss/BgH2pG4kuyA/pvLxryvBDgAJ

WARN com.google.gerrit.server.query.change.ChangeIsVisibleToPredicate : Skipping change 796 because the corresponding repository was not found.

The change is still on ReviewDb or referenced in the Index: that's why you see the warning.
However, has no impact on the UX and everything works.

And further pointing out at https://groups.google.com/d/topic/repo-discuss/clQjA8sFwqQ/discussion which discuss issues when doing the Notedb migration, there is even a patch https://gerrit-review.googlesource.com/c/gerrit/+/235928

So it seems we have to cleanup the database from orphan changes. Apparently the database is still being used for this specific case :/

hashar claimed this task.

That has been fixed as part of the upgrade to Gerrit 3.2