Page MenuHomePhabricator

CI test failure due to Git error
Closed, ResolvedPublic

Description

Two PHP tests are failing in the Thanks repository with the following error on this patch:

15:07:50 Traceback (most recent call last):
15:07:50   File "/usr/local/bin/quibble", line 8, in <module>
15:07:50     sys.exit(main())
15:07:50   File "/usr/local/lib/python3.7/dist-packages/quibble/cmd.py", line 901, in main
15:07:50     dry_run=args.dry_run,
15:07:50   File "/usr/local/lib/python3.7/dist-packages/quibble/cmd.py", line 554, in execute
15:07:50     quibble.commands.execute_command(command)
15:07:50   File "/usr/local/lib/python3.7/dist-packages/quibble/commands.py", line 46, in execute_command
15:07:50     command.execute()
15:07:50   File "/usr/local/lib/python3.7/dist-packages/quibble/commands.py", line 181, in execute
15:07:50     self.zuul_url,
15:07:50   File "/usr/local/lib/python3.7/dist-packages/quibble/zuul.py", line 110, in clone
15:07:50     future.result()
15:07:50   File "/usr/lib/python3.7/concurrent/futures/_base.py", line 425, in result
15:07:50     return self.__get_result()
15:07:50   File "/usr/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
15:07:50     raise self._exception
15:07:50   File "/usr/lib/python3.7/concurrent/futures/thread.py", line 57, in run
15:07:50     result = self.fn(*self.args, **self.kwargs)
15:07:50   File "/usr/local/lib/python3.7/dist-packages/quibble/zuul.py", line 127, in _clone_worker
15:07:50     raise e
15:07:50   File "/usr/local/lib/python3.7/dist-packages/quibble/zuul.py", line 123, in _clone_worker
15:07:50     project_cloner.prepareRepo(project, dest)
15:07:50   File "/usr/local/lib/python3.7/dist-packages/zuul/lib/cloner.py", line 169, in prepareRepo
15:07:50     repo.reset()
15:07:50   File "/usr/local/lib/python3.7/dist-packages/zuul/merger/merger.py", line 99, in reset
15:07:50     self.update()
15:07:50   File "/usr/local/lib/python3.7/dist-packages/zuul/merger/merger.py", line 205, in update
15:07:50     origin.fetch(tags=True)
15:07:50   File "/usr/local/lib/python3.7/dist-packages/git/remote.py", line 1027, in fetch
15:07:50     res = self._get_fetch_info_from_stderr(proc, progress, kill_after_timeout=kill_after_timeout)
15:07:50   File "/usr/local/lib/python3.7/dist-packages/git/remote.py", line 858, in _get_fetch_info_from_stderr
15:07:50     proc.wait(stderr=stderr_text)
15:07:50   File "/usr/local/lib/python3.7/dist-packages/git/cmd.py", line 657, in wait
15:07:50     raise GitCommandError(remove_password_if_present(self.args), status, errstr)
15:07:50 git.exc.GitCommandError: Cmd('git') failed due to: exit code(1)
15:07:50   cmdline: git fetch -v --tags -- origin
15:07:50 Build step 'Execute shell' marked build as failure
15:07:50 [PostBuildScript] - [INFO] Executing post build scripts.
15:07:50 [wmf-quibble-selenium-php74-docker] $ /bin/bash /tmp/jenkins10548054348145891852.sh
15:07:50 + set -o pipefail
15:07:50 ++ pwd
15:07:50 + exec docker run --entrypoint=/bin/rm --volume /srv/jenkins/workspace/wmf-quibble-selenium-php74-docker/log:/log --security-opt seccomp=unconfined --init --rm --label jenkins.job=wmf-quibble-selenium-php74-docker --label jenkins.build=80059 --env-file /dev/fd/63 docker-registry.wikimedia.org/buster:latest -fR /log/rawSeleniumVideoGrabs
15:07:50 ++ /usr/bin/env
15:07:50 ++ egrep -v '^(HOME|SHELL|PATH|LOGNAME|MAIL)='
15:07:52 [PostBuildScript] - [INFO] Executing post build scripts.
15:07:52 [wmf-quibble-selenium-php74-docker] $ /bin/bash -xe /tmp/jenkins14213485042704722726.sh
15:07:52 + find log/ -name 'mw-debug-*.log' -exec gzip '{}' +

Full log: https://integration.wikimedia.org/ci/job/wmf-quibble-vendor-mysql-php74-docker/49767/console and https://integration.wikimedia.org/ci/job/wmf-quibble-selenium-php74-docker/80059/console

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Michael triaged this task as Unbreak Now! priority.Jan 31 2024, 9:57 AM

We are seeing this error on several WMDE repositories as well (Wikibase, WikibaseLexeme, EntitySchema). Which, to my understanding, makes this a shared build failure and thus UBN.

Notably, this seems to affect quibble jobs, but not phan or node ones.

I have looked at the Gerrit Apache access log via https://logstash.wikimedia.org/app/dashboards#/view/825c5c80-8aef-11eb-8ab2-63c7f3b019fc for non 200 responses made from either of the WMCS instances and there is nothing showing up. Same for Gerrit: no errors on the server side :-(

Mh. That's strange. Just before the quote stack-trace. we see the success message of it actually having finished cloning the repos:

09:07:49 INFO:quibble.commands:<<< Finish: Zuul clone {"cache_dir": "/srv/git", "projects": [<...>], "workers": 8, "workspace": "/workspace/src", "zuul_branch": "master", "zuul_project": "mediawiki/extensions/Thanks", "zuul_ref": "refs/zuul/master/Z279790f6a2b549e7a03da167359758f8", "zuul_url": "git://contint2002.wikimedia.org"}, in 67.242 s

I did not see an error message before that. That sounds like most (all?) of the fetching actually succeed.
But still at least one of the repos must have had an issue, but maybe somehow on the git-client side? cmdline: git fetch -v --tags -- origin
That being said, I do see some other quibble jobs for extensions succeeding. So maybe it is a specific extension that is somehow "bad"?

/me looks at yesterday's Wikibase logs to maybe see a hint at what would have follow as the very next step after cloning.

09:07:49 INFO:quibble.commands:<<< Finish: Zuul clone

That is Quibble reporting it has finished running that "command", regardless of its status (it is actually failing) or whether the command raised an exception:

@contextmanager
def Chronometer(name, logger):
    start = time.time()
    logger('>>> Start: %s' % name)
    try:
        yield
    finally:
        duration = time.time() - start
        logger('<<< Finish: %s, in %.03f s' % (name, duration))

It is definitely the git fetch on the client side failing. I suspect something odd with a faulty ref or maybe a git tag that got changed on one of the repo and that git fetch --tags would refuse to update. It is certainly failing on single specific repository but since the processing is done in parallel (--git-parallel=8) it is hard to tell which one fails exactly :-\ Maybe because the git mirrors on the WMCS instances still have the old tag.

I guess I should try reproducing.

Ok, comparing a successful CI run for Wikibase: https://integration.wikimedia.org/ci/job/quibble-vendor-mysql-php74-noselenium-docker/150346/consoleText and a broken one: https://integration.wikimedia.org/ci/job/quibble-vendor-mysql-php74-noselenium-docker/150346/consoleText we can see that much fewer extensions are being cloned in the broken one before the "finish" message appears and the error.

The last apparently successfully cloned extension seems to be TimedMediaHandler. The first one that followed in the non-broken one would be WikibaseQualityConstraints, but I also notice that having WikibaseMediaInfo which is not in the broken one.

That would confirm that it is probably one of the extensions being bad.

Reproduced integration-agent-docker-1041.integration.eqiad1.wikimedia.cloud using:

sudo docker run --rm -it -v /srv/git:/srv/git:ro -e EXT_DEPENDENCIES='mediawiki/extensions/AbuseFilter\nmediawiki/extensions/AntiSpoof\nmediawiki/extensions/ArticlePlaceholder\nmediawiki/extensions/BetaFeatures\nmediawiki/extensions/CentralAuth\nmediawiki/extensions/CheckUser\nmediawiki/extensions/CirrusSearch\nmediawiki/extensions/Cite\nmediawiki/extensions/CodeEditor\nmediawiki/extensions/ConfirmEdit\nmediawiki/extensions/Echo\nmediawiki/extensions/Elastica\nmediawiki/extensions/EventBus\nmediawiki/extensions/EventLogging\nmediawiki/extensions/EventStreamConfig\nmediawiki/extensions/FlaggedRevs\nmediawiki/extensions/GeoData\nmediawiki/extensions/GuidedTour\nmediawiki/extensions/MobileApp\nmediawiki/extensions/MobileFrontend\nmediawiki/extensions/ParserFunctions\nmediawiki/extensions/PdfHandler\nmediawiki/extensions/PropertySuggester\nmediawiki/extensions/Renameuser\nmediawiki/extensions/Scribunto\nmediawiki/extensions/SecurePoll\nmediawiki/extensions/SiteMatrix\nmediawiki/extensions/SyntaxHighlight_GeSHi\nmediawiki/extensions/TemplateData\nmediawiki/extensions/TimedMediaHandler\nmediawiki/extensions/TorBlock\nmediawiki/extensions/UniversalLanguageSelector\nmediawiki/extensions/UserMerge\nmediawiki/extensions/VisualEditor\nmediawiki/extensions/WikiEditor\nmediawiki/extensions/Wikibase\nmediawiki/extensions/WikibaseCirrusSearch\nmediawiki/extensions/WikibaseLexeme\nmediawiki/extensions/WikibaseLexemeCirrusSearch\nmediawiki/extensions/WikibaseMediaInfo\nmediawiki/extensions/WikibaseQualityConstraints\nmediawiki/extensions/WikimediaBadges\nmediawiki/extensions/WikimediaEvents\nmediawiki/extensions/cldr' docker-registry.wikimedia.org/releng/quibble-buster-php74:1.6.0-s6 --skip-deps --skip=all

I have narrowed it down by using --git-parallel 1 to clone the repositories serially which eventually broke on mediawiki/extensions/UniversalLanguageSelector. Then I can repro using:

sudo docker run --rm -it -v /srv/git:/srv/git:ro docker-registry.wikimedia.org/releng/quibble-buster-php74:1.6.0-s6 --skip-deps --skip=all --git-parallel 1 mediawiki/extensions/UniversalLanguageSelector

And the issue no more happens if I remove the local git mirror from the container (by removing -v /srv/git:/srv/git:ro).

The thing is git fetch -v --tags will fail / exit 1 if a tag is being changed. That is exactly what has happened with the release of MediaWiki Language Extension Bundle 2024.01.

$ git clone /srv/git/mediawiki/extensions/UniversalLanguageSelector.git
$ cd UniversalLanguageSelector
$ git fetch  --dry-run -v --tags https://gerrit.wikimedia.org/r/mediawiki/extensions/UniversalLanguageSelector
...
 * [new tag]           2023.07    -> 2023.07
 = [up to date]        2023.10    -> 2023.10
 * [new tag]           2023.12    -> 2023.12
 ! [rejected]          2024.01    -> 2024.01  (would clobber existing tag)

In the mirror the tag is:

tag 2024.01
Tagger: Abijeet <abijeetpatro@gmail.com>
Date:   Thu Jan 25 18:42:48 2024 +0530

MediaWiki language extension bundle 2024.01

commit cf3eece9e328a1bc25cfad9807882b1ed10c5c8a (tag: 2024.01)
Author: Umherirrender <umherirrender_de.wp@web.de>

Whereas in Gerrit the tag is a signed one:

tag 2024.01
Tagger: Kartik Mistry <kartik.mistry@gmail.com>
Date:   Wed Jan 31 10:16:39 2024 +0300

MediaWiki language extension bundle 2024.01
-----BEGIN PGP SIGNATURE-----

iQIzBAABCAAdFiEEa2MbXvVUr2sRlmKSAsHT8ng6pN4FAmW589cACgkQAsHT8ng6
pN7MKBAAufxK79x8FPW71PqhHxwu0A5C5Wu2oRPpwn17D6CsvVZ5lUhh9VQ9GKCU
P6RbhHDiYklbsA9WmFasKbyT/y7DKCOK9+LVdih6YVfOYFZjMgPw/WTZ6aUZIlyF
azBntzWv75ljx+WD/8G8TZ43T8D3apniBeAJuRkroT7EL+tJcXUN8J2nMcnKDQlC
XLlfDEC0LLw6Tw2tf0aqmEV1ItSEZGbumE8mIW2IossObgoWyzAAHx31kpwzAHHi
p3xI/603XS7s6PznuZ/kYd0hepokyolm4+f9729Xbz/SGFdSCRkTBhxMWbbAHBBd
FjJndo1FlDXegX5I/ePC2WwNCEr3nJNKMp9Iz/u6gejHF9TpoV9e9AeW8GAA9aJa
HM46KNyxRWvUe08q4iQGkeBIPqpn9IOFy3oPiK2w9SDxKakbBxsAnXVALzQoUD1u
xV7RVxVMPJ8Lvc3FXmLB8G2sD1ZfKYes0N+DsB3tv1STGSWxLVwtI5iwMZXySjq2
rKV8QTa1a76f1mvQOX+SpUaXRXTvCJOa/SGgn3CKeDEclP0SRtC+K4bYWk8FiATB
EHcMY0Ekob8ZYYHrew3s1LNARDmj2X4CUkYWMsB/sDyhlDKWxuM5ADERYJr2qyM6
gimqwxOBGkB+R191b56J0l2j1A16JKeFPxI9Czu2RwaTwew57Uc=
=GfhW
-----END PGP SIGNATURE-----

commit e9454bf9d1b6ba4b6824031dad6f87d1b562c7f3 (tag: 2024.01)

It is a variation of 2020 bug T252310 which affected the Zuul merger when some tag for pywikibot/core got changed. The analysis at T252310#6222727 is the same and we need git to force fetch. The fix in the Zuul merger was https://gerrit.wikimedia.org/r/c/integration/zuul/+/605529/1/zuul/merger/merger.py:

-        origin.fetch(tags=True)
+        origin.fetch(tags=True, force=True)

The code being used in the Zuul merger got copied to Quibble before that fix as can be seen in the stacktrace:

15:07:50   File "/usr/local/lib/python3.7/dist-packages/zuul/merger/merger.py", line 205, in update
15:07:50     origin.fetch(tags=True)

Also the git mirrors we maintain on the WMCS instance (ci-gitcache-refresh) do a fetch with pruning:

command     => '/usr/bin/find /srv/git -type d -name \'*.git\' -exec git -C {} fetch origin --prune \'+refs/heads/*:refs/heads/*\' \;',

But as I found eventually later is that you need to also explicitly ask for pruning tags (git fetch --prune-tags) / https://gerrit.wikimedia.org/r/c/operations/puppet/+/757944

https://gerrit.wikimedia.org/r/c/operations/puppet/+/757944

Change 994685 had a related patch set uploaded (by Hashar; author: Hashar):

[operations/puppet@production] ci: prune tags when updating git mirrors

https://gerrit.wikimedia.org/r/994685

Mentioned in SAL (#wikimedia-releng) [2024-01-31T12:02:30Z] <hashar> Force updating tags in the CI git mirrors for T356247: sudo cumin --force 'name:docker' "/usr/bin/find /srv/git -type d -name '*.git' -exec git -C {} fetch origin --prune --prune-tags --force \;"

hashar lowered the priority of this task from Unbreak Now! to High.Jan 31 2024, 12:05 PM
hashar added a project: Quibble.

Output of the command:

From https://gerrit.wikimedia.org/r/mediawiki/extensions/cldr
 * [new tag]         2023.07    -> 2023.07
 * [new tag]         2023.12    -> 2023.12
 t [tag update]      2024.01    -> 2024.01
From https://gerrit.wikimedia.org/r/mediawiki/extensions/Translate
 * [new tag]             2023.07    -> 2023.07
 * [new tag]             2023.12    -> 2023.12
 t [tag update]          2024.01    -> 2024.01
From https://gerrit.wikimedia.org/r/mediawiki/extensions/UniversalLanguageSelector
 * [new tag]           2023.07    -> 2023.07
 * [new tag]           2023.12    -> 2023.12
 t [tag update]        2024.01    -> 2024.01

I believe that fixes the issue. The proper fix is to port the patch I have done for the zuul-merger back in 2020 to Quibble in order for it to force fetch tags instead of bailing out due to git. As a side track: the error from git should be reported to the console which would have made debugging WAY easier.

The proper fix is to port the patch I have done for the zuul-merger back in 2020 to Quibble in order for it to force fetch tags instead of bailing out due to git.

IMHO the even more proper fix is to not update tags. The Git manual very strongly advises against it, and we’ve just seen why; the Quibble change fixes the issue for Quibble, but anyone else who saw the old tag will now also have to fix their clone manually (@Michael reported errors from git fetch --tags in their local repo on IRC). I don’t know exactly what happened here, but CCing @abi_ and @KartikMistry based on the tags seen in T356247#9501839.

As a side track: the error from git should be reported to the console which would have made debugging WAY easier.

+1, the last few lines of the stack trace look like stderr was captured somewhere (proc.wait(stderr=stderr_text), raise GitCommandError(remove_password_if_present(self.args), status, errstr)) but apparently it didn’t end up in the console nor in the artifacts.

For the record, I can say that cloning seems to work fine again for our repos. (I checked Wikibase, WikibaseLexeme, EntitySchema.)

Change 994704 had a related patch set uploaded (by Hashar; author: Hashar):

[integration/quibble@master] Force git to fetch tags being updated

https://gerrit.wikimedia.org/r/994704

Change 994704 merged by jenkins-bot:

[integration/quibble@master] Force git to fetch tags being updated

https://gerrit.wikimedia.org/r/994704

The other mystery was the exception not showing the standard error but just the return code and the command line:

15:07:50     raise GitCommandError(remove_password_if_present(self.args), status, errstr)
15:07:50 git.exc.GitCommandError: Cmd('git') failed due to: exit code(1)
15:07:50   cmdline: git fetch -v --tags -- origin

I have reproduced it with GitPython, and the stderr does not show up with:

repo = git.Repo('/tmp/workspace/test/gerrit-ping')
repo.remotes.origin.fetch(tags=True)

But it does when using an other API:

repo.git.fetch(tags=True)

I have looked a bit at the code, they seem to overload subprocess.Popen and the code is from ten+ years ago. I am not going to bother investigating further into it.

IMHO the even more proper fix is to not update tags. The Git manual very strongly advises against it, and we’ve just seen why; the Quibble change fixes the issue for Quibble, but anyone else who saw the old tag will now also have to fix their clone manually (@Michael reported errors from git fetch --tags in their local repo on IRC). I don’t know exactly what happened here, but CCing @abi_ and @KartikMistry based on the tags seen in T356247#9501839.

I agree yes, though I have no idea how to implement the restriction in Gerrit.

I agree yes, though I have no idea how to implement the restriction in Gerrit.

https://gerrit-review.googlesource.com/Documentation/access-control.html#category_create_annotated says "Update of annotated tags is only allowed by granting Push with force option on refs/tags/*."
The mediawiki group allows push with and without force for refs/tags/*

Maybe remove the force? => https://gerrit.wikimedia.org/r/admin/repos/mediawiki,access
I have no idea about site effects for the release process etc.

Change 994685 merged by Dzahn:

[operations/puppet@production] ci: fetch tags for git mirrors

https://gerrit.wikimedia.org/r/994685

hashar changed the task status from Open to Stalled.Feb 2 2024, 11:45 AM

Stalled pending a release of Quibble to be cut which would ship the git fetch fix and then refreshing the CI images. That will happen eventually, I don't think, we need to do it immediately given the failure is unlikely to reoccurs anytime soon.

Change #1014014 had a related patch set uploaded (by Hashar; author: Hashar):

[integration/quibble@master] release: Quibble 1.7.0

https://gerrit.wikimedia.org/r/1014014

Change #1014014 merged by jenkins-bot:

[integration/quibble@master] release: Quibble 1.7.0

https://gerrit.wikimedia.org/r/1014014

Change #1014039 had a related patch set uploaded (by Hashar; author: Hashar):

[integration/config@master] dockerfiles: update Quibble to 1.7.0

https://gerrit.wikimedia.org/r/1014039

Change #1014039 merged by jenkins-bot:

[integration/config@master] dockerfiles: update Quibble to 1.7.0

https://gerrit.wikimedia.org/r/1014039

Stalled pending a release of Quibble to be cut which would ship the git fetch fix and then refreshing the CI images.

I have released Quibble 1.7.0 and switched the Jenkins jobs to use it.