Page MenuHomePhabricator

Quibble initialize step should only clone the target repository
Open, Stalled, Needs TriagePublic

Description

The first thing Quibble does is to clone all the repositories (mediawiki/core, potentially vendor and extension dependencies). It then cd to the repo that triggered the job and runs composer test and npm test. If one fails, that means we clone all the other repositories for nothing.

Quibble should just clone the ZUUL_PROJECT repo, run the linter then clone the rest.

Event Timeline

I wanted to give this a try, as a way to work through some questions that I've been stewing on regarding my Command object refactor. How to represent the higher-level stages? My suspicion is that stages are just a special case of dependency relationships between the fine-grained steps, and there should actually be several stages. One especially quirky point is that I don't want to plan the entire job until after cloning repos, because we can't analyze whether certain steps (e.g. npm run selenium-test) make any sense until the code is present. Here are some thoughts about the stages generally (bigger scope than this task):

  • Clone ZUUL_PROJECT
  • Analyze what steps can be taken to test this repo (look for composer.json, package.json...) and plan first stage.
  • First test stage (T211702), all steps in parallel
    • composer test
    • npm install && npm test
  • Clone all dependencies
  • Analyze what can be tested (e.g. look for tests/selenium in each repo) and plan second stage.
  • Second test stage, some steps can be in parallel

I'm imagining we might specify the process dependency graph in terms of abstract job milestones, so rather than couple tasks directly e.g. (repo_npm_test && repo_composer test -> clone dependencies), we would say (repo_npm_test && repo_composer_test -> REPO_TESTS_DONE) and (REPO_TESTS_DONE -> clone dependencies -> ALL_TESTS_READY) then (ALL_TESTS_READY -> ext_skin_composer_test), etc.

hashar moved this task from marble to Backlog on the Quibble board.

Change 588146 had a related patch set uploaded (by Awight; owner: Awight):
[integration/quibble@master] Clone only the target project at first

https://gerrit.wikimedia.org/r/588146

I've proposed a patch which doesn't rely on a big pile of refactoring :-)

The only glitch I ran into is that we have to remove non-core projects before cloning mediawiki/core, which reintroduces a small redundancy. See the commit message for a longer overview.

Change 588146 merged by jenkins-bot:
[integration/quibble@master] Clone only the target project at first

https://gerrit.wikimedia.org/r/588146

Change 589448 had a related patch set uploaded (by Jforrester; owner: Jforrester):
[integration/quibble@master] Release Quibble 0.0.42

https://gerrit.wikimedia.org/r/589448

Change 589448 merged by jenkins-bot:
[integration/quibble@master] Release Quibble 0.0.42

https://gerrit.wikimedia.org/r/589448

Change 589838 had a related patch set uploaded (by Awight; owner: Awight):
[integration/quibble@master] Wipe repo with non-git commands

https://gerrit.wikimedia.org/r/589838

Change 589838 merged by jenkins-bot:
[integration/quibble@master] Wipe repo with non-git commands

https://gerrit.wikimedia.org/r/589838

Change 589838 merged by jenkins-bot:
[integration/quibble@master] Wipe repo with non-git commands

https://gerrit.wikimedia.org/r/589838

@awight that commit is broken for me when the workspace/src is already populated by MediaWiki:

rmdir: failed to remove '/home/hashar/workspace/src/extensions': Directory not empty

Running:

$ rm -f src/LocalSettings.php; ZUUL_PROJECT=mediawiki/extensions/FileImporter quibble --git-cache=/home/hashar/projects --skip=all
INFO:quibble.commands:<<< Finish: Report package versions, in 3.289 s
INFO:quibble.commands:>>> Start: Zuul clone with parameters {"cache_dir": "/home/hashar/projects", "projects": ["mediawiki/extensions/FileImporter"], "workers": 4, "workspace": "/home/hashar/workspace/src", "zuul_project": "mediawiki/extensions/FileImporter"}
INFO:zuul.CloneMapper:Workspace path set to: /home/hashar/workspace/src
INFO:zuul.CloneMapper:Mapping projects to workspace...
INFO:zuul.CloneMapper:  mediawiki/extensions/FileImporter -> /home/hashar/workspace/src/extensions/FileImporter
INFO:zuul.CloneMapper:Expansion completed.
INFO:quibble.zuul.clone:Preparing 1 repositories with 4 workers
INFO:zuul.Cloner.mediawiki/extensions/FileImporter:Creating repo mediawiki/extensions/FileImporter from cache /home/hashar/projects/mediawiki/extensions/FileImporter
INFO:zuul.Cloner.mediawiki/extensions/FileImporter:Updating origin remote in repo mediawiki/extensions/FileImporter to https://gerrit.wikimedia.org/r/mediawiki/extensions/FileImporter
INFO:zuul.Cloner.mediawiki/extensions/FileImporter:Falling back to branch master
INFO:zuul.Cloner.mediawiki/extensions/FileImporter:Prepared mediawiki/extensions/FileImporter repo with branch master at commit 466acf9cae1862fe180b1ccff0ef3ca34bb73d5e
INFO:quibble.zuul.clone:Prepared all repositories
INFO:quibble.commands:<<< Finish: Zuul clone with parameters {"cache_dir": "/home/hashar/projects", "projects": ["mediawiki/extensions/FileImporter"], "workers": 4, "workspace": "/home/hashar/workspace/src", "zuul_project": "mediawiki/extensions/FileImporter"}, in 3.378 s
INFO:quibble.commands:>>> Start: Remove directory /home/hashar/workspace/src/extensions/FileImporter and any empty parents until /home/hashar/workspace/src
rmdir: failed to remove '/home/hashar/workspace/src/extensions': Directory not empty
INFO:quibble.commands:<<< Finish: Remove directory /home/hashar/workspace/src/extensions/FileImporter and any empty parents until /home/hashar/workspace/src, in 0.027 s
Traceback (most recent call last):
  File "/home/hashar/.local/bin/quibble", line 11, in <module>
    load_entry_point('quibble', 'console_scripts', 'quibble')()
  File "/home/hashar/projects/integration/quibble/quibble/cmd.py", line 494, in main
    cmd.execute(plan)
  File "/home/hashar/projects/integration/quibble/quibble/cmd.py", line 316, in execute
    quibble.commands.execute_command(command)
  File "/home/hashar/projects/integration/quibble/quibble/commands.py", line 24, in execute_command
    command.execute()
  File "/home/hashar/projects/integration/quibble/quibble/commands.py", line 776, in execute
    subprocess.check_call(['rmdir', path])
  File "/usr/lib/python3.7/subprocess.py", line 347, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['rmdir', '/home/hashar/workspace/src/extensions']' returned non-zero exit status 1.

There is also 888218fb47748c50383b45714423fabce0f6ad0b which moved the git clean.

Before those patches the workflow was to:

clone all repositories

if triggered for an extension/skin:

  • get to the directory
  • run npm/composer install and npm/composer test
  • use git clean to remove the installed node_modules / vendor

That second patch also changed to git clean -xqdff which when run from mediawiki/core would wipe the existing extensions/skins.

I am tempted to revert those patches and get back to the paper board.

I am tempted to revert those patches and get back to the paper board.

That's fine with me, of course. There are edge cases I didn't understand when writing this naive implementation... (and still don't understand fully!)

It's only possible to use this single-repo workflow when starting with a completely empty workspace, which makes it difficult to use quibble locally as you point out. I also noticed a glitch with this task, where git refuses to clone mediawiki-core if any subdirectories already exist. Therefore, it's necessary to completely wipe the single-repo checkout after running lint jobs. It might make sense to run the single-repo tests as its own job, since there are no dependencies between the single- and integration- tests, no benefit from leaving artifacts behind... If resources were infinite, we would even run these jobs in parallel, which would speed up the "happy path", but we probably wouldn't do this since a large proportion of jobs fail during single-repo linting.

How to proceed? We can revert and reconsider the requirements for this task, or I'd be happy to try to fix the remaining issues. I can imagine a bugfix for the current thing you're running into: remember which parent directories existed before cloning the single repo, and only delete empty parents if we've created them. For your example run, that would mean we rm -rf FileImporter but not src/ or src/extensions.

Guess I would first need to write some kind of high level test to list out which steps are expected in which case. That would help, that is surely doable by creating various build plans and assert they structures.

For the single repo patches you made, I will try to dig into it tomorrow and see whether there is some easy fix. Else revert and we try again :]

There are indeed a few different use cases (empty workspace, repositories present, skipping zuul) which would each require a slightly different chain of commands. But it is fixable!

Change 594791 had a related patch set uploaded (by Awight; owner: Awight):
[integration/quibble@master] Don't force remove additional repos

https://gerrit.wikimedia.org/r/594791

Change 594800 had a related patch set uploaded (by Awight; owner: Awight):
[integration/quibble@master] Don't force removing parent directories

https://gerrit.wikimedia.org/r/594800

Hopefully those last two patches get us over the finish line! The second -f to git clean is reverted, and non-empty parent directories will no longer cause an exception.

Change 602402 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/quibble@master] Revert "Wipe repo with non-git commands"

https://gerrit.wikimedia.org/r/602402

Change 602403 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/quibble@master] Revert "Clone only the target project at first"

https://gerrit.wikimedia.org/r/602403

Change 602402 merged by jenkins-bot:
[integration/quibble@master] Revert "Wipe repo with non-git commands"

https://gerrit.wikimedia.org/r/602402

Change 602403 merged by jenkins-bot:
[integration/quibble@master] Revert "Clone only the target project at first"

https://gerrit.wikimedia.org/r/602403

@awight I have reviewed the changes as well as some pending open hotfixes, but ended up reverting a couple changes. That seems the easiest to restore Quibble working state and let us cut a new release.

I also noticed that mediawiki/core would be passed to Zuul cloner twice, which probably means two fetches of the reference and that is reasonably slow for such a big repository (Wikibase or any high traffic repo would surely be affected as well).

The feature is definitely needed, but maybe we need to spend some time on defining various sequences of commands to executed for the various use cases we have (ZUUL_PROJECT=mediawiki/extensions/Foobar, use of --skip-deps and/or --skip-zuul. I guess we can use this task description as a white board to identify the plans to build in each cases?

Change 602776 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/quibble@master] High level testing for the execution plan

https://gerrit.wikimedia.org/r/602776

Change 602776 merged by jenkins-bot:
[integration/quibble@master] High level testing for the execution plan

https://gerrit.wikimedia.org/r/602776

Change 594791 abandoned by Awight:
Don't force remove additional repos

Reason:
The rest of this feature is currently reverted, so I'm abandoning until we figure out a better implementation.

https://gerrit.wikimedia.org/r/594791

Change 594800 abandoned by Hashar:
Don't force removing parent directories

Reason:
Obsolete since the feature got reverted

https://gerrit.wikimedia.org/r/594800

awight changed the task status from Open to Stalled.Jan 26 2022, 11:09 AM

Deploying a naive patch for this caused problems described in the revert, e19c2a96e998d. The implementation needs to be planned better before trying again. Maybe we can tweak the zuul cloner to stash and reuse the main project checkout.

Causes several issues:

[quibble/commands.py]

* ExtSkinComposerNpmTest runs npm/composer installs and definitely needs
  to clean the mess that got installed before we continue. Notably:
  * the composer dependencies must be injected by mediawiki/core via
    composer.local.json and the composer merge plugin
  * our ecosystem does not support providing Javascript libraries
    through npm. They have to be committed in the extension/skin.
  One can argue that ExtSkinComposerNpmTest should support --skip-deps
  to skip running the composer/npm install. But that is a different
  issue.

* GitClean went from passing `-f` to `-ff` which thus drop the
  extension/skins submodules. That would break for example
  mediawiki/extensions/VisualEditor

[quibble/cmd.py]
* The target repository (ZUUL_PROJECT) is cloned twice, which for
  mediawiki/core takes a while (git protocol 1 has to fetch all
  references when negotiating which objects to transfer).

It's still a motivating problem: cloning all dependencies takes 50 seconds of early startup, which must finish before any tests can be run.