Looking for uses of TitleMoveCompleting:
https://codesearch.wmflabs.org/search/?q=TitleMoveCompleting&i=nope&files=&repos= includes Flow
https://codesearch.wmflabs.org/deployed/?q=TitleMoveCompleting&i=nope&files=&repos= doesn't include Flow
Description
Related Objects
Event Timeline
gerrit-replica is 502ing again, probably because of OOM.
Jun 10 20:06:09 codesearch6 docker[28181]: 2020/06/10 20:06:09 Failed to git fetch /data/data/vcs-12e933bde61f91eb6f3be5a28027f9dcbe79a111, see output below Jun 10 20:06:09 codesearch6 docker[11226]: 2020/06/10 20:06:09 vcs pull error (Extension:Flow - https://gerrit-replica.wikimedia.org/r/mediawiki/extensions/Flow.git): exit status 128 Jun 10 20:06:09 codesearch6 docker[11226]: Continuing... Jun 10 20:06:09 codesearch6 docker[11226]: fatal: unable to access 'https://gerrit-replica.wikimedia.org/r/mediawiki/extensions/Flow.git/': The requested URL returned error: 502
10:21 < mutante> !log restarting gerrit on gerrit-replica (gerrit2001) - java.lang.OutOfMemoryError: Java heap space
On gerrit2001. seems like it ended up being out of java heap space at 10:13 UTC:
[2020-06-11 10:13:26,029] [HTTP-95987] ERROR com.google.gerrit.pgm.http.jetty.HiddenErrorHandler : Error in GET /r/mediawiki/skins/MonoBook.git/info/refs?service=git-upload-pack java.lang.OutOfMemoryError: Java heap space
And restarted automagically?
[2020-06-11 10:22:56,113] [main] INFO com.google.gerrit.server.cache.h2.H2CacheFactory : Enabling disk cache /var/lib/gerrit2/review_site/cache
The Java heap is set to 32G (-Xmx32g).
We had the issue earlier today apparently:
[2020-06-11 06:16:38,803] [HTTP-95844] WARN org.eclipse.jetty.servlet.ServletHandler : Error for /r/mediawiki/extensions/TitleBlacklist.git/info/refs java.lang.OutOfMemoryError: Java heap space
And I guess kept continuing until the service restarted somehow.
And looks like a good amount of requests originate from codesearch6.codesearch.eqiad1.wikimedia.cloud which apparently tries to crawl the repository as fast as it can :-(
For reference:
It's set to crawl every hour, but because of how many repositories there are, it might actually be constantly crawling. I'll bump it up to every 90 minutes.
Mentioned in SAL (#wikimedia-cloud) [2020-06-11T11:04:27Z] <legoktm> restarting everything after gerrit-replica 502s fixed T255094 T255125
We could also have codesearch crawls from local copy of the repositories by replicating Gerrit repositories to the codesearch host. But I guess we would then want to promote codesearch to prod.
One implication of going to production is that it wouldn't be able to speak to the outside meaning it wouldn't be able to index github, etc. I'm not against moving it to production, just flagging up this problem.
Config documentation is https://gerrit.wikimedia.org/r/plugins/replication/Documentation/config.md
Wikitech documentation is mainly about troubleshooting, but may be something valuable here as well: https://wikitech.wikimedia.org/wiki/Gerrit#Replication
We replicate to github and gerrit replica via the replication plugin, currently. We use one worker thread to do replication to both and this keeps up with traffic (with occasional delays on the order of magnitude of minutes). One concern might be stability; i.e., if one worker thread is tied up doing retries; however, we have room, likely, to add another replication worker.
