Page MenuHomePhabricator

jenkins-bot ran gate-and-submit and voted V+2, but did not submit (merge) the change
Closed, ResolvedPublic

Description

jenkins-bot ran gate-and-submit and voted V+2, but did not submit (merge) the change https://gerrit.wikimedia.org/r/#/c/332551/. I complained on IRC, @hashar said to file this here.

[22:37]	<MatmaRex> this is interesting: https://gerrit.wikimedia.org/r/#/c/332551/ – jenkins ran the gate-and-submit jobs, but did not actually submit (merge) the change after voting V+2? i just did that manually
[22:40]	<paladox> hashar ^^
[22:40]	<paladox> have no idea if that's gerrit or zuul
[22:40]	<paladox> ostriches ^^
[22:41]	<paladox> MatmaRex it is showing as jenking merged it here https://phabricator.wikimedia.org/rMEXTddc32ffb17b5880e9e4ec4302bbce047192afabe
[22:43]	<hashar> MatmaRex: somehow Zuul failed reporting to gerrit
[22:43]	<hashar> Exception: Gerrit error executing gerrit review --project mediawiki/extensions/ProofreadPage --message "Gate pipeline build succeeded.
[22:44]	<hashar>   UPDATE: c198e8aae99d1fc8ad3a831eb4d4b893f3b080f5 22efd18434056b57e816e6f7cac1f06d5b261c73 refs/heads/master  (LOCK_FAILURE)
[22:44]	<hashar> MatmaRex: can you fill a task for it in phabricator against  #gerrit please ?
[22:45]	<MatmaRex> can do
[22:45]	<hashar> Gerrit failed to submit/merge the change
[22:46]	<hashar> some other thread was apparently trying to deal with the ref

When Zuul asked Gerrit to submit Gerrit bailed out:

Zuul view:

2017-01-17 21:19:59,668 DEBUG zuul.DependentPipelineManager: Reporting change <Change 0x7fa8a83ddfd0 332551,2>
2017-01-17 21:19:59,668 DEBUG zuul.DependentPipelineManager: success [<GerritReporter connection: gerrit://gerrit>]
2017-01-17 21:19:59,668 INFO zuul.DependentPipelineManager: Reporting item <QueueItem 0x7fa8a8d58850 for <Change 0x7fa8a83ddfd0 332551,2> in gate-and-submit>, actions: [<GerritReporter connection: gerrit://gerrit>]
2017-01-17 21:19:59,669 DEBUG zuul.reporter.gerrit.Reporter: Report change <Change 0x7fa8a83ddfd0 332551,2>, params {'verified': 2, 'submit': True}, message: Gate pipeline build succeeded.

- composer-php55-trusty https://integration.wikimedia.org/ci/job/composer-php55-trusty/9596/console : SUCCESS in 32s
- composer-hhvm-trusty https://integration.wikimedia.org/ci/job/composer-hhvm-trusty/9523/console : SUCCESS in 42s
- php55lint https://integration.wikimedia.org/ci/job/php55lint/36280/console : SUCCESS in 3s
- mwext-testextension-php55 https://integration.wikimedia.org/ci/job/mwext-testextension-php55/33013/console : SUCCESS in 1m 22s
- mwext-testextension-hhvm https://integration.wikimedia.org/ci/job/mwext-testextension-hhvm/34732/console : SUCCESS in 1m 38s
- npm-node-6-jessie https://integration.wikimedia.org/ci/job/npm-node-6-jessie/158/console : SUCCESS in 1m 11s

2017-01-17 21:20:00,042 ERROR zuul.DependentPipelineManager: Exception while reporting:
Traceback (most recent call last):
  File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/scheduler.py", line 1764, in _reportItem
    ret = self.sendReport(actions, self.pipeline.source, item)
  File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/scheduler.py", line 1291, in sendReport
    ret = reporter.report(source, self.pipeline, item)
  File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/reporter/gerrit.py", line 39, in report
    message, self.reporter_config)
  File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/connection/gerrit.py", line 282, in review
    out, err = self._ssh(cmd)
  File "/usr/share/python/zuul/local/lib/python2.7/site-packages/zuul/connection/gerrit.py", line 387, in _ssh
    raise Exception("Gerrit error executing %s" % command)
Exception: Gerrit error executing gerrit review --project mediawiki/extensions/ProofreadPage --message "Gate pipeline build succeeded.

- composer-php55-trusty https://integration.wikimedia.org/ci/job/composer-php55-trusty/9596/console : SUCCESS in 32s
- composer-hhvm-trusty https://integration.wikimedia.org/ci/job/composer-hhvm-trusty/9523/console : SUCCESS in 42s
- php55lint https://integration.wikimedia.org/ci/job/php55lint/36280/console : SUCCESS in 3s
- mwext-testextension-php55 https://integration.wikimedia.org/ci/job/mwext-testextension-php55/33013/console : SUCCESS in 1m 22s
- mwext-testextension-hhvm https://integration.wikimedia.org/ci/job/mwext-testextension-hhvm/34732/console : SUCCESS in 1m 38s
- npm-node-6-jessie https://integration.wikimedia.org/ci/job/npm-node-6-jessie/158/console : SUCCESS in 1m 11s
" --verified 2 --submit 332551,2

Gerrit view:

2017-01-17 21:19:52,152] WARN  com.google.gerrit.server.extensions.events.EventUtil : Error in listener com.google.gerrit.server.events.StreamEventsApiListener for event com.google.gerrit.server.extensions.events.GitReferenceUpdated: null
[2017-01-17 21:19:54,076] WARN  com.google.gerrit.server.extensions.events.EventUtil : Error in listener com.google.gerrit.server.events.StreamEventsApiListener for event com.google.gerrit.server.extensions.events.GitReferenceUpdated: null
[2017-01-17 21:19:56,086] WARN  com.google.gerrit.server.extensions.events.EventUtil : Error in listener com.google.gerrit.server.events.StreamEventsApiListener for event com.google.gerrit.server.extensions.events.GitReferenceUpdated: null
[2017-01-17 21:19:58,212] WARN  com.google.gerrit.server.extensions.events.EventUtil : Error in listener com.google.gerrit.server.events.StreamEventsApiListener for event com.google.gerrit.server.extensions.events.GitReferenceUpdated: null
[2017-01-17 21:20:00,040] ERROR com.google.gerrit.server.git.MergeOp : [332551-1484687999829-36149524]Error from integrateIntoHistory
com.google.gerrit.server.git.IntegrationException: Error submitting change: 
BatchRefUpdate failed: BatchRefUpdate[
  UPDATE: c198e8aae99d1fc8ad3a831eb4d4b893f3b080f5 22efd18434056b57e816e6f7cac1f06d5b261c73 refs/heads/master  (LOCK_FAILURE)
]
        at com.google.gerrit.server.git.MergeOp.integrateIntoHistory(MergeOp.java:497)
        at com.google.gerrit.server.git.MergeOp.merge(MergeOp.java:433)
        at com.google.gerrit.server.change.Submit.apply(Submit.java:225)
        at com.google.gerrit.server.api.changes.RevisionApiImpl.submit(RevisionApiImpl.java:183)
        at com.google.gerrit.server.api.changes.RevisionApiImpl.submit(RevisionApiImpl.java:177)
        at com.google.gerrit.sshd.commands.ReviewCommand.reviewPatchSet(ReviewCommand.java:333)
        at com.google.gerrit.sshd.commands.ReviewCommand.run(ReviewCommand.java:241)
        at com.google.gerrit.sshd.SshCommand$1.run(SshCommand.java:35)
        at com.google.gerrit.sshd.BaseCommand$TaskThunk.run(BaseCommand.java:442)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at com.google.gerrit.server.git.WorkQueue$Task.run(WorkQueue.java:417)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: com.google.gerrit.server.git.UpdateException: BatchRefUpdate failed: BatchRefUpdate[
  UPDATE: c198e8aae99d1fc8ad3a831eb4d4b893f3b080f5 22efd18434056b57e816e6f7cac1f06d5b261c73 refs/heads/master  (LOCK_FAILURE)
]
        at com.google.gerrit.server.git.BatchUpdate.executeRefUpdates(BatchUpdate.java:699)
        at com.google.gerrit.server.git.BatchUpdate.execute(BatchUpdate.java:404)
        at com.google.gerrit.server.git.MergeOp.integrateIntoHistory(MergeOp.java:478)
        ... 16 more

Event Timeline

matmarex created this task.Jan 17 2017, 9:47 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 17 2017, 9:47 PM
hashar updated the task description. (Show Details)Jan 17 2017, 9:49 PM
Paladox added a comment.EditedJan 17 2017, 9:49 PM

Could this be a bug because we doint rebase before merging?

http://stackoverflow.com/questions/11228436/failed-to-lock-refs-heads-master

Gerrit had a spam of:

WARN  com.google.gerrit.server.extensions.events.EventUtil :
Error in listener com.google.gerrit.server.events.StreamEventsApiListener
for event com.google.gerrit.server.extensions.events.GitReferenceUpdated: null

Which I guess was holding the lock for the repository refs/heads/master then the submit got rejected due to the lock being held.

Maybe Gerrit/Jgit should retry locking?

It is most probably a once in a while failure.

Possibly the same problem from 24 January: https://gerrit.wikimedia.org/r/#/c/332359/

@matmarex when you press the submit button, do you get any errors?

@Paladox It seemed to be something else, patch above being dependent on older patch set elsewhere.

This looks like a zuul problem rather then a gerrit problem as it seemed you could press the submit button to merge. Has this problem happened since reporting this task?

Paladox it is definitely an issue in Gerrit, due to a lock on the branch tip which prevents Gerrit from updating it and thus failing the submit:

Error submitting change: 
BatchRefUpdate failed: BatchRefUpdate[
  UPDATE: c198e8aae99d1fc8ad3a831eb4d4b893f3b080f5 22efd18434056b57e816e6f7cac1f06d5b261c73 refs/heads/master  (LOCK_FAILURE)

From the last 10 days or so of Gerrit logs, it happened only once:

Caused by: com.google.gerrit.server.git.UpdateException: BatchRefUpdate failed: BatchRefUpdate[
  UPDATE: ddc01e67cb7faa85289ac3264bbfa45d71614347 d86085cc1b8bd050cc996b276055cee9d9704438 refs/heads/production  (LOCK_FAILURE)
]
        at com.google.gerrit.server.git.BatchUpdate.executeRefUpdates(BatchUpdate.java:699)
        at com.google.gerrit.server.git.BatchUpdate.execute(BatchUpdate.java:404)
        at com.google.gerrit.server.git.MergeOp.integrateIntoHistory(MergeOp.java:478)

That was https://gerrit.wikimedia.org/r/#/c/338762/

No clue what is happening nor how to reproduce it though.

Oh, it may be fixed in gerrit 2.14 as I did see some commits related to performance improvements and others to do with fixing duplicating opening the repo.

I found someone else reported this against gerrit 2.13 https://groups.google.com/forum/m/#!topic/repo-discuss/lvLRH8bmvl4

I see this https://gerrit-review.googlesource.com/#/c/87512/ change upstream which turns the error into a 500. But that's on the master branch.

This may be fixed by https://gerrit-review.googlesource.com/#/c/98530/ but it is really only a guess fix.

Upstream have merged the backported commits I did. They may not fix the problem but it will throw a RestApiException instead of UpdateException.

Also this https://gerrit.googlesource.com/gerrit/+/085a3089c508338ff6f6ae1f9733b203571618f8%5E%21/#F0 commit looks like it may do something as it prevents closing the repo so many times when it is already closed.

It also has BatchRefUpdate in there too.

Change 395475 had a related patch set uploaded (by Nikerabbit; owner: Nikerabbit):
[translatewiki@master] RepoNG: Run gerrit merges serially

https://gerrit.wikimedia.org/r/395475

Nikerabbit added a subscriber: Nikerabbit.

Sorry for the noise. I meant to put the bug into commit body.

I think this is mostly resolved? Haven’t seen any more reports.

matmarex closed this task as Resolved.Jan 26 2018, 4:15 PM

Hmm, probably. I haven't seen this issue in a year.