zuul merge cloner might be broken
Closed, ResolvedPublic

Description

I've been reviewing a change for rTSTW. The change was submited to the master branch, and it was up-to-date. Jenkins always failed because of "the change needs rebasing" (https://gerrit.wikimedia.org/r/#/c/422988/). However the change was perfectly up-to-date. I force merged it, thinking that it was a single-time issue. However further tests uploading changes to that repo always fail on jenkins, even after reverting the force merge. @Legoktm suggested on wikimedia-cloud that this might be a zuul cloner issue being broken. I'd appreciate if you could take a look. Thanks.

Example with labs/tools/stewardbots change https://gerrit.wikimedia.org/r/#/c/423010/ , the Zuul merger fails to update from Gerrit:

2018-03-30 09:39:36,392 DEBUG zuul.Merger: Merging for change 423010,1.
2018-03-30 09:39:36,393 DEBUG zuul.Merger: Processing refspec refs/changes/10/423010/1 for project labs/tools/stewardbots / master ref Z49dd4cf8e9bc405ea9823e28fd3c4d6e
2018-03-30 09:39:36,400 DEBUG zuul.Merger: Unable to find commit for ref master/Z49dd4cf8e9bc405ea9823e28fd3c4d6e
2018-03-30 09:39:36,400 DEBUG zuul.Merger: No base commit found for (u'labs/tools/stewardbots', u'master')
2018-03-30 09:39:36,400 DEBUG zuul.Repo: Resetting repository /srv/zuul/git/labs/tools/stewardbots
2018-03-30 09:39:36,401 DEBUG zuul.Repo: Updating repository /srv/zuul/git/labs/tools/stewardbots
2018-03-30 09:39:36,518 ERROR zuul.Merger: Unable to reset repo <zuul.merger.merger.Repo object at 0x7feec267b3d0>
Traceback (most recent call last):
  File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/merger/merger.py", line 326, in _mergeItem
    repo.reset()
  File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/merger/merger.py", line 101, in reset
    self.update()
  File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/merger/merger.py", line 207, in update
    origin.fetch(tags=True)
  File "/usr/share/python/zuul/local/lib/python2.7/site-packages/git/remote.py", line 743, in fetch
    res = self._get_fetch_info_from_stderr(proc, progress)
  File "/usr/share/python/zuul/local/lib/python2.7/site-packages/git/remote.py", line 640, in _get_fetch_info_from_stderr
    finalize_process(proc, stderr=stderr_text)
  File "/usr/share/python/zuul/local/lib/python2.7/site-packages/git/util.py", line 155, in finalize_process
    proc.wait(**kwargs)
  File "/usr/share/python/zuul/local/lib/python2.7/site-packages/git/cmd.py", line 335, in wait
    raise GitCommandError(self.args, status, errstr)
GitCommandError: 'git fetch --tags -v origin' returned with exit code 128
stderr: 'fatal: internal server error
remote: internal server error
fatal: protocol error: bad pack header'
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 29 2018, 6:37 PM
hashar updated the task description. (Show Details)Mar 30 2018, 9:46 AM
hashar added a subscriber: hashar.

And on Gerrit server side:

[2018-03-30 09:39:36,971] [SSH git-upload-pack /labs/tools/stewardbots (jenkins-bot)] ERROR com.google.gerrit.sshd.BaseCommand : Internal server error (user jenkins-bot account 75) during git-upload-pack '/labs/tools/stewardbots'
org.eclipse.jgit.transport.UploadPackInternalServerErrorException
        at org.eclipse.jgit.transport.UploadPack.sendPack(UploadPack.java:1402)
        at org.eclipse.jgit.transport.UploadPack.service(UploadPack.java:775)
        at org.eclipse.jgit.transport.UploadPack.upload(UploadPack.java:668)
        at com.google.gerrit.sshd.commands.Upload.runImpl(Upload.java:78)
        at com.google.gerrit.sshd.AbstractGitCommand.service(AbstractGitCommand.java:97)
        at com.google.gerrit.sshd.AbstractGitCommand.access$000(AbstractGitCommand.java:30)
        at com.google.gerrit.sshd.AbstractGitCommand$1.run(AbstractGitCommand.java:63)
        at com.google.gerrit.sshd.BaseCommand$TaskThunk.run(BaseCommand.java:453)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at com.google.gerrit.server.git.WorkQueue$Task.run(WorkQueue.java:418)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: org.eclipse.jgit.errors.MissingObjectException: Missing tree 2d0b76708818f12c28d8125fa4ec38eee98cd888
        at org.eclipse.jgit.internal.storage.file.WindowCursor.open(WindowCursor.java:164)
        at org.eclipse.jgit.revwalk.ObjectWalk.newTreeVisit(ObjectWalk.java:761)
        at org.eclipse.jgit.revwalk.ObjectWalk.nextObject(ObjectWalk.java:416)
        at org.eclipse.jgit.internal.storage.pack.PackWriterBitmapWalker.findObjects(PackWriterBitmapWalker.java:135)
        at org.eclipse.jgit.internal.storage.pack.PackWriter.findObjectsToPackUsingBitmaps(PackWriter.java:1876)
        at org.eclipse.jgit.internal.storage.pack.PackWriter.findObjectsToPack(PackWriter.java:1671)
        at org.eclipse.jgit.internal.storage.pack.PackWriter.preparePack(PackWriter.java:800)
        at org.eclipse.jgit.transport.UploadPack.sendPack(UploadPack.java:1516)
        at org.eclipse.jgit.transport.UploadPack.sendPack(UploadPack.java:1396)
        ... 15 more

On the Zuul merger:

zuul@contint1001:/srv/zuul/git/labs/tools/stewardbots
$ git remote -v
origin	ssh://jenkins-bot@gerrit.wikimedia.org:29418/labs/tools/stewardbots (fetch)
origin	ssh://jenkins-bot@gerrit.wikimedia.org:29418/labs/tools/stewardbots (push)
$ git fetch --tags -v origin
fatal: internal server error
remote: internal server error 
fatal: protocol error: bad pack header
$ GIT_TRACE=1 GIT_TRACE_PACKET=1 git fetch --tags -v origin
...
09:52:14.436092 pkt-line.c:80           packet:        fetch< NAK
fatal: internal server error
09:52:14.477244 pkt-line.c:80           packet:        fetch< ACK bc4e8a3373a57d029c2136f3561878a6da5cc316
09:52:14.477475 pkt-line.c:80           packet:     sideband< \2Counting objects: 1   \15
09:52:14.477509 pkt-line.c:80           packet:     sideband< \3internal server error
remote: internal server error
fatal: protocol error: bad pack header

No clue.

Mentioned in SAL (#wikimedia-releng) [2018-03-30T09:56:10Z] <hashar> Nuking /srv/zuul/git/labs/tools/stewardbots on zuul-merger hosts (contint1001 and contint2001). Fetch fails with org.eclipse.jgit.transport.UploadPackInternalServerErrorException | T191077

hashar closed this task as Resolved.Mar 30 2018, 9:57 AM
hashar claimed this task.

I have no idea what might have been going on really :( After nuking the local git repositories on the Zuul-merger and doing a recheck of https://gerrit.wikimedia.org/r/#/c/423010/ , it works:

2018-03-30 09:56:36,810 DEBUG zuul.MergeServer: Got merge job: 72b0b60dd36d42cdabef775ee6023e85
2018-03-30 09:56:36,811 DEBUG zuul.Merger: Merging for change 423010,1.
2018-03-30 09:56:36,811 DEBUG zuul.Merger: Processing refspec refs/changes/10/423010/1 for project labs/tools/stewardbots / master ref Z46203bec30174dc78243c5516426c67f
2018-03-30 09:56:36,811 DEBUG zuul.Repo: Cloning from ssh://jenkins-bot@gerrit.wikimedia.org:29418/labs/tools/stewardbots to /srv/zuul/git/labs/tools/stewardbots
2018-03-30 09:56:37,215 DEBUG zuul.Merger: Unable to find commit for ref master/Z46203bec30174dc78243c5516426c67f
2018-03-30 09:56:37,215 DEBUG zuul.Merger: No base commit found for (u'labs/tools/stewardbots', u'master')
2018-03-30 09:56:37,216 DEBUG zuul.Repo: Resetting repository /srv/zuul/git/labs/tools/stewardbots
2018-03-30 09:56:37,217 DEBUG zuul.Repo: Updating repository /srv/zuul/git/labs/tools/stewardbots
2018-03-30 09:56:37,370 DEBUG zuul.Repo: Checking out dce930cc3ebd963c93824500f412425555541127
2018-03-30 09:56:37,505 DEBUG zuul.Repo: Merging refs/changes/10/423010/1 with args ['-s', 'resolve', 'FETCH_HEAD']
2018-03-30 09:56:37,525 DEBUG zuul.Repo: CreateZuulRef master/Z46203bec30174dc78243c5516426c67f at a73a7ed07eceb3768b9b8e30fac5f83671bf2eee on <git.Repo "/srv/zuul/git/labs/tools/stewardbots/.git">