Page MenuHomePhabricator

Quibble should fatal out on clone/fetch failure"ERROR:zuul.Repo:Unable to initialize repo for npm-test.git"
Closed, ResolvedPublic

Description

For example at https://integration.wikimedia.org/ci/job/mediawiki-quibble-vendor-mysql-php72-docker/5393

15:21:40 INFO:zuul.Cloner.mediawiki/skins/Vector:Updating origin remote in repo mediawiki/skins/Vector to https://gerrit.wikimedia.org/r/mediawiki/skins/Vector
15:21:40 ERROR:zuul.Repo:Unable to initialize repo for https://gerrit.wikimedia.org/r/npm-test
15:21:40 Traceback (most recent call last):
15:21:40   File "/usr/local/lib/python3.5/dist-packages/zuul/merger/merger.py", line 51, in __init__
15:21:40     self._ensure_cloned()
15:21:40   File "/usr/local/lib/python3.5/dist-packages/zuul/merger/merger.py", line 63, in _ensure_cloned
15:21:40     git.Repo.clone_from(self.remote_url, self.local_path)
15:21:40   File "/usr/lib/python3/dist-packages/git/repo/base.py", line 925, in clone_from
15:21:40     return cls._clone(git, url, to_path, GitCmdObjectDB, progress, **kwargs)
15:21:40   File "/usr/lib/python3/dist-packages/git/repo/base.py", line 880, in _clone
15:21:40     finalize_process(proc, stderr=stderr)
15:21:40   File "/usr/lib/python3/dist-packages/git/util.py", line 341, in finalize_process
15:21:40     proc.wait(**kwargs)
15:21:40   File "/usr/lib/python3/dist-packages/git/cmd.py", line 291, in wait
15:21:40     raise GitCommandError(self.args, status, errstr)
15:21:40 git.exc.GitCommandError: Cmd('git') failed due to: exit code(128)
15:21:40   cmdline: git clone -v https://gerrit.wikimedia.org/r/npm-test /workspace/src/npm-test
15:21:40   stderr: 'Cloning into '/workspace/src/npm-test'...
15:21:40 fatal: remote error: npm-test unavailable
15:21:40 '
15:21:41 INFO:zuul.Cloner.mediawiki/vendor:Updating origin remote in repo mediawiki/vendor to https://gerrit.wikimedia.org/r/mediawiki/vendor

I don't know what this repo is supposed to be, but looks like not supposed to be in the list of repos for Zuul-cloner to clone?

Also, while it is a good thing that this particular case is a non-fatal error, at the same time it is worrying that the job is not marked as failure when Zuul-cloner had a fatal error in cloning one of the specified repositories.

Given how much automation we have for detecting what to install and do, it seems plausible that this could in the future be hiding errors. E.g. where an extension can't be cloned for some reason, and thus it is running fewer tests as if it's all good.

Event Timeline

(This task is from before the 0.0.35 quibble upgrade.)

Is it possible that quibble_args: '--skip composer-test npm-test' is being mis-interpretted as quibble_args: '--skip composer-test (--target) npm-test'?

Re-confirmed on 0.0.35 (build).

integration/config is the only repo we have with the string npm-test in it, so I'm assuming it's a config error.

The build was for https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/537445/2 and there is no extension dependencies injected to it.

The mediawiki-quibble-* jobs are an optimization for mediawiki that skips composer-test and npm-test, I am not sure whether the optimization should be kept but that is a different topic.

The relevant parts in the build output:

docker run docker-registry.wikimedia.org/releng/quibble-stretch-php72:0.0.34-1 --skip composer-test npm-test

DEBUG:quibble.cmd:ZUUL_PROJECT=mediawiki/core


INFO:quibble.cmd:Projects: mediawiki/core, mediawiki/skins/Vector, mediawiki/vendor, npm-test

The issue is that several stages are passed to the --skip parameter while other arguments are supposed to be repositories to clone.

The long story is in Quibble commit summary c4c02f9edd9f4288283cf7cc1932198d7d4b5d21 for T218357, which is to differentiate between extra values passed to an option versus arguments to the quibble command. Python argparse supports separating them by simply using -- to stop options parsing. Eg:

--skip composer-test npm-test --

So I am not sure why only that build would have failed? Maybe the job got fixed, or it never worked??

Anyway the stages to skip or run should now be comma separated:

--run STAGE[,STAGE ...]
                      Tests to run. Comma separated. (default: all).
--skip STAGE[,STAGE ...]
                      Stages to skip. Comma separated. Set to "all" to skip
                      all stages. (default: none).

I guess that answers my concern from T225248#5485061. Glad to hear we're fine with skipping npm-test in Quibble (makes sense, already its own job). I suspect this never worked indeed, or has otherwise stopped working a number of months ago.

00:00:31.608 ERROR:zuul.Repo:Unable to initialize repo for https://gerrit.wikimedia.org/r/npm-test
00:00:31.609 Traceback (most recent call last):
00:00:31.609   File "/usr/local/lib/python3.5/dist-packages/zuul/merger/merger.py", line 51, in __init__
00:00:31.609     self._ensure_cloned()
00:00:31.610   File "/usr/local/lib/python3.5/dist-packages/zuul/merger/merger.py", line 63, in _ensure_cloned
00:00:31.610     git.Repo.clone_from(self.remote_url, self.local_path)
00:00:31.610   File "/usr/lib/python3/dist-packages/git/repo/base.py", line 925, in clone_from
00:00:31.611     return cls._clone(git, url, to_path, GitCmdObjectDB, progress, **kwargs)
00:00:31.611   File "/usr/lib/python3/dist-packages/git/repo/base.py", line 880, in _clone
00:00:31.611     finalize_process(proc, stderr=stderr)
00:00:31.611   File "/usr/lib/python3/dist-packages/git/util.py", line 341, in finalize_process
00:00:31.612     proc.wait(**kwargs)
00:00:31.612   File "/usr/lib/python3/dist-packages/git/cmd.py", line 291, in wait
00:00:31.612     raise GitCommandError(self.args, status, errstr)
00:00:31.612 git.exc.GitCommandError: Cmd('git') failed due to: exit code(128)
00:00:31.613   cmdline: git clone -v https://gerrit.wikimedia.org/r/npm-test /workspace/src/npm-test
00:00:31.613   stderr: 'Cloning into '/workspace/src/npm-test'...
00:00:31.613 fatal: remote error: npm-test unavailable
00:00:31.613 '

But the clone failure is entirely ignored!!!! That should be fixed :-\

Also that job is passing: --skip composer-test npm-test, but npm-test is considered a repository and thus it is NOT skipped (but the composer test is effectively skipped).

So that optimized job was to prevent running npm test on mediawiki/core for each of the php version, but it never worked for that case :-\

Change 537743 had a related patch set uploaded (by Jforrester; owner: Jforrester):
[integration/config@master] jjb: [quibble] Fix syntax for --skip command

https://gerrit.wikimedia.org/r/537743

@Legoktm did the optimizations for mediawiki/core. I guess we can revisit what should be run for mediawiki/core and maybe drop some of the optimization that have been made.

At least the error is harmless for now since Quibble ignores the failure to clone the non existing npm-test.git repository. But we have the job running npm-test when it should not.

The fix is to update the job to use: --skip=composer-test,npm-test

One can test the effect by using --dry-run which builds the execution plan but exit before executing it:

$ colordiff -U0 \
  <(quibble --dry-run --skip=composer-test npm-test 2>&1) \
  <(quibble --dry-run --skip=composer-test,npm-test 2>&1)

--- /dev/fd/63	2019-09-18 21:47:44.298438737 +0200
+++ /dev/fd/62	2019-09-18 21:47:44.298438737 +0200
@@ -1 +1 @@
-DEBUG:quibble.cmd:Running stages: phpunit-unit, phpunit, npm-test, qunit, selenium
+DEBUG:quibble.cmd:Running stages: phpunit-unit, phpunit, qunit, selenium
@@ -4 +4 @@
-INFO:quibble.cmd:Projects: mediawiki/core, mediawiki/skins/Vector, mediawiki/vendor, npm-test
+INFO:quibble.cmd:Projects: mediawiki/core, mediawiki/skins/Vector, mediawiki/vendor
@@ -6 +6 @@
-DEBUG:quibble.cmd:Zuul clone with parameters {"cache_dir": "ref", "projects": ["mediawiki/core", "mediawiki/skins/Vector", "mediawiki/vendor", "npm-test"], "workers": 4, "workspace": "/home/hashar/projects/integration/quibble/src"}
+DEBUG:quibble.cmd:Zuul clone with parameters {"cache_dir": "ref", "projects": ["mediawiki/core", "mediawiki/skins/Vector", "mediawiki/vendor"], "workers": 4, "workspace": "/home/hashar/projects/integration/quibble/src"}
@@ -13 +13 @@
-DEBUG:quibble.cmd:Run tests in mediawiki/core: npm
+DEBUG:quibble.cmd:Run tests in mediawiki/core: 
@@ -15 +15 @@
-DEBUG:quibble.cmd:Browser tests using DISPLAY=:0, for projects mediawiki/core, mediawiki/skins/Vector, mediawiki/vendor, npm-test
+DEBUG:quibble.cmd:Browser tests using DISPLAY=:0, for projects mediawiki/core, mediawiki/skins/Vector, mediawiki/vendor

Note: Quibble does not support multiple skip, only the last one will be honored (--skip=composer-test --skip=npm-test would only skip npm-test). They must be comma separated.

Change 537743 merged by jenkins-bot:
[integration/config@master] jjb: [quibble] Fix syntax for --skip command

https://gerrit.wikimedia.org/r/537743

The mis-configuration that caused this error is fixed, but hashar pointed out that this should actually fatal, not just warn.

hashar renamed this task from Quibble jobs error (non-fatal) "ERROR:zuul.Repo:Unable to initialize repo for npm-test.git" to Quibble should fatal out on clone/fetch failure"ERROR:zuul.Repo:Unable to initialize repo for npm-test.git".Sep 19 2019, 10:05 AM

I think that is due to Support to clone repositories in parallel (5f58fd252e499a37f19da753c064b7e34fc35028) released with 0.0.30. Passing `

$ quibble --git-parallel 1 RepoDoesNotExist; echo "Exit code: $?"
ERROR:zuul.Repo:Unable to initialize repo for https://gerrit.wikimedia.org/r/RepoDoesNotExit
Traceback (most recent call last):
  File "/home/hashar/projects/integration/quibble/zuul/merger/merger.py", line 51, in __init__
    self._ensure_cloned()
  File "/home/hashar/projects/integration/quibble/zuul/merger/merger.py", line 63, in _ensure_cloned
    git.Repo.clone_from(self.remote_url, self.local_path)
  File "/home/hashar/.local/lib/python3.7/site-packages/git/repo/base.py", line 1021, in clone_from
    return cls._clone(git, url, to_path, GitCmdObjectDB, progress, multi_options, **kwargs)
  File "/home/hashar/.local/lib/python3.7/site-packages/git/repo/base.py", line 967, in _clone
    finalize_process(proc, stderr=stderr)
  File "/home/hashar/.local/lib/python3.7/site-packages/git/util.py", line 333, in finalize_process
    proc.wait(**kwargs)
  File "/home/hashar/.local/lib/python3.7/site-packages/git/cmd.py", line 412, in wait
    raise GitCommandError(self.args, status, errstr)
git.exc.GitCommandError: Cmd('git') failed due to: exit code(128)
  cmdline: git clone -v https://gerrit.wikimedia.org/r/RepoDoesNotExit /home/hashar/workspace/src/RepoDoesNotExit
  stderr: 'Cloning into '/home/hashar/workspace/src/RepoDoesNotExit'...
fatal: remote error: RepoDoesNotExit unavailable
'
Traceback (most recent call last):
  File "/home/hashar/.local/bin/quibble", line 11, in <module>
    load_entry_point('quibble', 'console_scripts', 'quibble')()
  File "/home/hashar/projects/integration/quibble/quibble/cmd.py", line 475, in main
    cmd.execute(plan)
  File "/home/hashar/projects/integration/quibble/quibble/cmd.py", line 448, in execute
    command.execute()
  File "/home/hashar/projects/integration/quibble/quibble/commands.py", line 39, in execute
    self.zuul_project, self.zuul_ref, self.zuul_url)
  File "/home/hashar/projects/integration/quibble/quibble/zuul.py", line 74, in clone
    return zuul_cloner.execute()
  File "/home/hashar/projects/integration/quibble/zuul/lib/cloner.py", line 75, in execute
    self.prepareRepo(project, dest)
  File "/home/hashar/projects/integration/quibble/zuul/lib/cloner.py", line 160, in prepareRepo
    repo = self.cloneUpstream(project, dest)
  File "/home/hashar/projects/integration/quibble/zuul/lib/cloner.py", line 116, in cloneUpstream
    raise Exception("Error cloning %s to %s" % (git_upstream, dest))
Exception: Error cloning https://gerrit.wikimedia.org/r/RepoDoesNotExit to /home/hashar/workspace/src/RepoDoesNotExit
Exit code: 1

But with --git-parallel, the exception is logged but the process continue.

quibble/zuul.py
with ThreadPoolExecutor(max_workers=workers) as executor:
    for project, dest in dests.items():
        # Copy and hijack the logger
        project_cloner = copy.copy(zuul_cloner)
        project_cloner.log = project_cloner.log.getChild(project)

        executor.submit(project_cloner.prepareRepo, project, dest)

log.info("Prepared all repositories")

Which does not catch the issue :-\

Change 538020 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/quibble@master] Exit on cloning failure

https://gerrit.wikimedia.org/r/538020

The jobs have been corrected. Quibble would then need to fatal out as soon as a repository can not be cloned/fetched etc.

hashar triaged this task as Medium priority.Sep 19 2019, 1:19 PM

Change 538020 merged by jenkins-bot:
[integration/quibble@master] Exit on cloning failure

https://gerrit.wikimedia.org/r/538020