Page MenuHomePhabricator

pywikibot get merge rejections due to zuul-merger not being able to update tags
Closed, ResolvedPublic

Description

Since https://gerrit.wikimedia.org/r/#/c/pywikibot/core/+/595192/ Jenkins sometimes fails to merge up-to-date (rebased) patchsets and fails to run. These jobs are not even available in Jenkins job view (https://integration.wikimedia.org/ci/view/Pywikibot/job/pywikibot-core-tox-publish/). After few attempts to recheck, everything goes back to normal usually. This occurs for all types of actions (patchset, merge, post-merge). There must be some issue with the trigger.

Explanation

Whenever tags are updated in pywikibot/core (stable and python2 tags), the zuul-merger will rejects any further merge attempts because it can not update the tag. It is a change of behavior in git 2.20 shipped by Buster which now reject tag updates (unless using --force or allowing the reference update by prefixing the refspec with a +.

Workaround

On each of the servers hosting zuul-merger, manually force update the tags:

ssh contint1001.wikimedia.org sudo -u zuul git -C /srv/zuul/git/pywikibot/core fetch --force --tags -v origin
ssh contint2001.wikimedia.org sudo -u zuul git -C /srv/zuul/git/pywikibot/core fetch --force --tags -v origin

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Dzahn lowered the priority of this task from Unbreak Now! to High.May 11 2020, 1:31 PM
Dzahn added a subscriber: Dzahn.

A planned migration of the CI system is in progress.

A revert back to the old system as before has happened due to issues with Jenkins after the switch to a newer server.

I see, I didn't know. I wish you the upgrade will work out finally.

@Dvorapa Sorry, we should have mailed more public venues like wikitech-l about this. We will do it next time and also we had to revert.

Things should work for you again as before. Sorry for the inconvenience.

hashar added a subscriber: hashar.

CI was under maintenance (T224591) which I have announced Thursday on wikitech: https://lists.wikimedia.org/pipermail/wikitech-l/2020-May/093356.html (it is a bit of a short notice, I will do better next time).

The next change that merges for pywikibot should trigger the post build job. If not please reopen and I will investigate :)

I see, I thought wikitech list is for that wiki, but it was weird to me there is no WMCI, WM-deploy, WM-releng, or WM-devel list. Anyway thank you for your explanation.

Kizule added a subscriber: Kizule.

(per previous comment)

There was no ongoing work today that i am aware of. So this should have a different cause.

It seems it happens randomly for random patchset/merge/postmerge

I added "recheck" on https://gerrit.wikimedia.org/r/c/pywikibot/core/+/595213/3 and i got a jenkins-bot response. So i guess i can't reproduce right now.

That example also seems to show it working and the result being added 2 minutes after uploading the latest patch set. Are the previous failures not just actual rebase issues?

That example also seems to show it working and the result being added 2 minutes after uploading the latest patch set. Are the previous failures not just actual rebase issues?

It showed "Merge if Necessary" with no issue. It would also show "Cannot Merge" if it would be rebase issue

It would also show "Cannot Merge" if it would be rebase issue

But it does say "was unable to be automatically merged" and I have seen that many times in other repos when it needed a manual rebase. I may not understand the issue though. Let's see what @hashar says.

I would wait for some next occurence. I didn't want this to be reopened as there is currently nothing to examine. That weird one with no "Cannot Merge" badge, but with the "was unable to be automatically merged" message was rewritten quite immediately.

Dzahn lowered the priority of this task from High to Medium.May 15 2020, 1:40 PM

Ok, let's keep it open but lower prio from High to Medium, i propose.

Dvorapa added a subscriber: Jdforrester-WMF.

@Jdforrester-WMF This is not Pywikibot issue, so at least one releng tag needs to stay here (not sure which one)

If you code isn't up to date with master, CI tries to rebase it for you (inside zuul-merger). If it can't, it tells you with the message "was unable to be automatically merged" and then exits. FWICT above, except for a few minutes when CI was intentionally offline for maintenance, this is operating as expected.

What is the bug/concern/issue here?

In the last patch linked here, it is failing to merge and after "recheck", it merges perfectly well the same patchset. This is definitely wrong from my point of view. Please explain, why this happens if it is not a wrong behavior.

And again, up-to-date with master, can't be rebased more, but still merge fails: https://gerrit.wikimedia.org/r/#/c/pywikibot/core/+/598157/

Third attempt (using recheck keyword) worked finally.

Dvorapa renamed this task from Pywikibot post-merge job fails to start up to Jenkins can't merge patchsets sometimes.May 23 2020, 12:55 PM
Dvorapa updated the task description. (Show Details)
Dvorapa moved this task from Backlog to Wikimedia prod/Cloud Services issues on the Pywikibot board.

Now verify always fails if the parent isn't master but another patch set which isn't merged already. For example
https://gerrit.wikimedia.org/r/#/c/pywikibot/core/+/598977/ bases on
https://gerrit.wikimedia.org/r/#/c/pywikibot/core/+/598876/ which bases on origin/master

A recheck is successfull after the parent was merged in this example:
https://gerrit.wikimedia.org/r/#/c/pywikibot/core/+/560057/

This is a regression to the previous behaviour when a patchset chain could be published.

Now verify always fails if the parent isn't master but another patch set which isn't merged already. For example
https://gerrit.wikimedia.org/r/#/c/pywikibot/core/+/598977/ bases on
https://gerrit.wikimedia.org/r/#/c/pywikibot/core/+/598876/ which bases on origin/master

A recheck is successfull after the parent was merged in this example:
https://gerrit.wikimedia.org/r/#/c/pywikibot/core/+/560057/

This is a regression to the previous behaviour when a patchset chain could be published.

For this specific one:

contint2001:/srv/zuul/git/pywikibot/core$ sudo -u zuul git fetch --tags -v origin
From ssh://gerrit.wikimedia.org:29418/pywikibot/core
 = [up to date]        master       -> origin/master
 = [up to date]        3.0.20170403 -> 3.0.20170403
 = [up to date]        3.0.20170521 -> 3.0.20170521
 = [up to date]        3.0.20170713 -> 3.0.20170713
 = [up to date]        3.0.20170801 -> 3.0.20170801
 = [up to date]        3.0.20171212 -> 3.0.20171212
 = [up to date]        3.0.20180108 -> 3.0.20180108
 = [up to date]        3.0.20180204 -> 3.0.20180204
 = [up to date]        3.0.20180302 -> 3.0.20180302
 = [up to date]        3.0.20180304 -> 3.0.20180304
 = [up to date]        3.0.20180403 -> 3.0.20180403
 = [up to date]        3.0.20180505 -> 3.0.20180505
 = [up to date]        3.0.20180603 -> 3.0.20180603
 = [up to date]        3.0.20180710 -> 3.0.20180710
 = [up to date]        3.0.20180823 -> 3.0.20180823
 = [up to date]        3.0.20180922 -> 3.0.20180922
 = [up to date]        3.0.20181203 -> 3.0.20181203
 = [up to date]        3.0.20190106 -> 3.0.20190106
 = [up to date]        3.0.20190204 -> 3.0.20190204
 = [up to date]        3.0.20190301 -> 3.0.20190301
 = [up to date]        3.0.20190430 -> 3.0.20190430
 = [up to date]        3.0.20190722 -> 3.0.20190722
 = [up to date]        3.0.20200111 -> 3.0.20200111
 = [up to date]        3.0.20200306 -> 3.0.20200306
 = [up to date]        3.0.20200326 -> 3.0.20200326
 = [up to date]        3.0.20200405 -> 3.0.20200405
 = [up to date]        3.0.20200508 -> 3.0.20200508
 ! [rejected]          python2      -> python2  (would clobber existing tag)
 ! [rejected]          stable       -> stable  (would clobber existing tag)
$ echo $?
1

I have no idea what that would clobber existing tag error is about :(

So there was some tag issue? Do we now have just to reintroduce them and everything will be fine?

Works again and the stable/python2 tags are available: https://gerrit.wikimedia.org/r/#/admin/projects/pywikibot/core,tags?skip=25. I think this can be closed then.

Thanks hashar!

Xqt claimed this task.
Xqt removed Xqt as the assignee of this task.
Dvorapa reopened this task as Open.EditedJun 11 2020, 1:26 PM

Occurs again. Possibly have something to do with git tags, because today were two tags changed and one added.

@hashar Could you once again do whatever fixed the issue the last time?

If I remove those two tags stable and python2 which where changed previously (i.e. removed upstream and synchronized with local repository), the problem does not occur. Adding them again, we get that This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset message. Zuul Status shows Skipped in that case. We need these two tags to be pointed to the last stable release which is 3.0.20200609 currently.

After 3.0.2020070x we will no longer need to update python2.

stable was suggested by several people to be made into a branch, quite a disadvantage of git this is.

stable was suggested by several people to be made into a branch, quite a disadvantage of git this is.

Doesn't this needs a patch to be merged twice? I have bad recollections with that during Pywikibot 2 branch time.

In standard git environemnt this would mean just to merge master branch into stable branch after every release, but I'm not familiat how this works with Gerrit (@Urbanecm ?)

I would be fine with that proposal.

With git it should work like this:

$ git tag -a 3.0.20200707
$ python3 setup.py sdist
... possibly cleanup or clone a clean repository
$ git checkout stable
$ git merge master -m "3.0.20200707"
... creates a commit merging master into stable
$ git review stable

I'm unsure if this is the correct procedure, perhaps @zhuyifei1999 could help too?

It should be much more simple like:

$ python3 setup.py sdist
... possibly cleanup or clone a clean repository
$ git review stable
... pushes master into stable directly

But I feel like this would require to squash all commits since the last version into one commit like described in https://www.mediawiki.org/wiki/Gerrit/Tutorial#Squash_several_commits_into_one_single_commit_via_rebase

StackOverflow suggests this:

$ git reset 3.0.20200609
$ git add .
$ git commit -m "3.0.20200707"
$ git review stable

The stable tag was created 23th March 2019 (see T217908) which was equal to tag 3.0.20190301. After that the stable tag was updated with every release: 3.0.20190430, 3.0.20190722, 3.0.20200111, 3.0.20200306, 3.0.20200405. The python2 tag was introduced with release 3.0.20200111. This Problem occured the first time after 3.0.20200508 and was solved after deleting stable and python2 tags. The tags where synchronized few weeks ago and gerrit keeps working until the tags where moved with the new release 3.0.20200609.

Btw I can delete These two tags via Gerrit web interface but I cannot delete them via git command but got an error

Maybe deleting via web interface does not clean these tags completely like it did before May 2020 when this bug occurred the first time?

Now I created the stable branch as proposed above. Gerrit works as expected but it still fails if I synchonize the remaining python2 tag:

git.exe push -v --progress --tags "origin" master:master
Pushing to ssh://gerrit.wikimedia.org:29418/pywikibot/core
Total 0 (delta 0), reused 0 (delta 0), pack-reused 0
remote: Processing changes: done
To ssh://gerrit.wikimedia.org:29418/pywikibot/core
= [up to date]        master -> master
...
= [up to date]        3.0.20200609 -> 3.0.20200609
* [new tag]           python2 -> python2
updating local tracking ref 'refs/remotes/origin/master'

Success (3140 ms @ 13.06.2020 12:49:06)

But see https://gerrit.wikimedia.org/r/#/c/pywikibot/core/+/602298/

Yeah, it's definitely a Regression as this never hapenned before

Yeah, it's definitely a Regression as this never hapenned before

Oh, I guess we had such previously 2016, see T134062

That is the same as previously, whenever the python2 or stable tags are updated, git would plainly refuses to update them with the message: would clobber existing tag:

contint2001:/srv/zuul/git/pywikibot/core$ sudo -u zuul git fetch --tags -v origin
From ssh://gerrit.wikimedia.org:29418/pywikibot/core
...
 ! [rejected]          python2      -> python2  (would clobber existing tag)
 ! [rejected]          stable       -> stable  (would clobber existing tag)
$ echo $?
1

Which in git is emitted by:

static int update_local_ref( ... )
{
        if (!is_null_oid(&ref->old_oid) &&
            starts_with(ref->name, "refs/tags/")) {
                if (force || ref->force) {
                        int r;
                        r = s_update_ref("updating tag", ref, 0);
                        format_display(display, r ? '!' : 't', _("[tag update]"),
                                       r ? _("unable to update local ref") : NULL,
                                       remote, pretty_ref, summary_width);
                        return r;
                } else {
                        format_display(display, '!', _("[rejected]"), _("would clobber existing tag"),
                                       remote, pretty_ref, summary_width);
                        return 1;
                }
        }

Or in other term, git fetch no more magically update tags. That has to be explicitly allowed by using --force. The change went with git 2.20.

The change to git was done by Ævar Arnfjörð Bjarmason (who contributed a lot to MediaWiki) : https://git.kernel.org/pub/scm/git/git.git/commit/?id=0bc8d71b99e91c9e90b519073b639a5066119591

From the manpage:

git-fetch(1)

Until Git version 2.20, and unlike when pushing with git-push(1), any updates to refs/tags/* would be accepted without + in the refspec (or --force). When fetching, we promiscuously considered all tag updates from a remote to be forced fetches.

Since Git version 2.20, fetching to update refs/tags/* works the same way as when pushing. I.e. any updates will be rejected without + in the refspec (or --force).

The root cause is the upgrade of contint servers to Buster which brings git 2.20: T224591

hashar renamed this task from Jenkins can't merge patchsets sometimes to pywikibot get merge rejections due to zuul-merger not being able to update tags.Jun 15 2020, 7:20 AM
hashar claimed this task.
hashar updated the task description. (Show Details)

I have updated the repositories that will happen again next time a tag is updated.

Change 605529 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/zuul@patch-queue/debian/jessie-wikimedia] WMF: force update tags when updating repo

https://gerrit.wikimedia.org/r/605529

Repro:

git --version  # 2.20.1
pip2 install --user GitPython==2.1.11 gitdb2==2.0.5 smmap2==2.0.5
cd projects/pywikibot/core
python2
>>> import git
>>> repo = git.Repo('.')
>>> repo.remotes.origin.fetch(tags=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "git/remote.py", line 789, in fetch
    res = self._get_fetch_info_from_stderr(proc, progress)
  File "git/remote.py", line 675, in _get_fetch_info_from_stderr
    proc.wait(stderr=stderr_text)
  File "git/cmd.py", line 415, in wait
    raise GitCommandError(self.args, status, errstr)
git.exc.GitCommandError: Cmd('git') failed due to: exit code(1)
  cmdline: git fetch --tags -v origin

Passing force=True works:

>>> repo.remotes.origin.fetch(tags=True, force=True)
[<git.remote.FetchInfo object at 0x7f48e75a5c58>, <git.remote.FetchInfo object at 0x7f48e75a5d08>, <git.remote.FetchInfo object at 0x7f48e75a5d60>, <git.remote.FetchInfo object at 0x7f48e75a5db8>, <git.remote.FetchInfo object at 0x7f48e75a5e10>, <git.remote.FetchInfo object at 0x7f48e75a5e68>, <git.remote.FetchInfo object at 0x7f48e75a5ec0>, <git.remote.FetchInfo object at 0x7f48e75a5f18>, <git.remote.FetchInfo object at 0x7f48e75a5f70>, <git.remote.FetchInfo object at 0x7f48e75a5fc8>, <git.remote.FetchInfo object at 0x7f48e75b5050>, <git.remote.FetchInfo object at 0x7f48e75b50a8>, <git.remote.FetchInfo object at 0x7f48e75b5100>, <git.remote.FetchInfo object at 0x7f48e75b5158>, <git.remote.FetchInfo object at 0x7f48e75b51b0>, <git.remote.FetchInfo object at 0x7f48e75b5208>, <git.remote.FetchInfo object at 0x7f48e75b5260>, <git.remote.FetchInfo object at 0x7f48e75b52b8>, <git.remote.FetchInfo object at 0x7f48e75b5310>, <git.remote.FetchInfo object at 0x7f48e75b5368>, <git.remote.FetchInfo object at 0x7f48e75b53c0>, <git.remote.FetchInfo object at 0x7f48e75b5418>, <git.remote.FetchInfo object at 0x7f48e75b5470>, <git.remote.FetchInfo object at 0x7f48e75b54c8>, <git.remote.FetchInfo object at 0x7f48e75b5520>, <git.remote.FetchInfo object at 0x7f48e75b5578>, <git.remote.FetchInfo object at 0x7f48e75b55d0>, <git.remote.FetchInfo object at 0x7f48e75b5628>, <git.remote.FetchInfo object at 0x7f48e75b5680>, <git.remote.FetchInfo object at 0x7f48e75b56d8>]

Change 605529 merged by Hashar:
[integration/zuul@patch-queue/debian/jessie-wikimedia] WMF: force update tags when updating repo

https://gerrit.wikimedia.org/r/605529

Change 605530 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/zuul/deploy@master] Update to force update tags when updating repo

https://gerrit.wikimedia.org/r/605530

I will update Zuul in production when I get time ahead to properly monitor the deployment. It is not convenient for me to do so this morning due to hmm personal reasons. Hopefully this afternoon else later this evening (relative to Europe timezones).

Change 605530 merged by Hashar:
[integration/zuul/deploy@master] Update to force update tags when updating repo

https://gerrit.wikimedia.org/r/605530

I have updated the zuul-merger on contint1001.wikimedia.org. I will do contint2001.wikimedia.org later on when CI is less busy.

Mentioned in SAL (#wikimedia-operations) [2020-06-16T06:04:19Z] <hashar> Restarted Zuul scheduler and merger on contint2001 a couple hotfixes # T252310 T255424

I ran git fetch --tags --force on both hosts. Zuul itself now uses --force as well.

Should be good now :]

hashar added a subscriber: elukey.

That is apparently not fully deployed. @elukey had the exact same issue today with analytics/refinery/source . contint2001 runs an outdated version of zuul. Guess I forgot to deploy it there :\

Mentioned in SAL (#wikimedia-operations) [2020-07-01T13:35:46Z] <hashar> Restarting zuul-merger on contint2001 # T252310

And I have restated zuul-scheduler.