Page MenuHomePhabricator

Quibble: Run PHPUnit databaseless and database stages in parallel
Open, Needs TriagePublic

Description

After looking at test execution times for PHPUnit's databaseless + database tests in T225730#5570377, I thought again about T50217 but in a different context. T50217 is probably still worth investigating on its own, but a different take on this would have Quibble doing some of the work for us using python's multiprocessing library. The rough idea would be that when the execution plan contains both databaseless and database phpunit runs, rather than the current pattern of

for command in plan:
            with quibble.Chronometer(str(command), self.log.info):
                command.execute()

We would instead use multiprocessing to start two processes, one for databaseless tests and one for database tests, then collect the error code and output for both processes.

Thoughts?

Event Timeline

This following is entirely optional, but would it be possible to have the "db-less" stage have its output streamed normally (not buffered) and then once completed for the buffer of the "db" stage to be flushed and then stream as normal? This way we don't have the Jenkins job be silently for 2-5 minutes, which can help distinguish between a a CI node that's stalled/stuck vs one that is just executing as normal (I think we have a separate timeout on the job as a whole vs no output in N seconds/minutes, but not entirely sure). Might also help with those few that tail the logs directly without waiting for Zuul to get useful output quicker.

If this isn't easily feasible though, by no means required for MVP :)

Great idea!

Here's an example of how I'd like to handle parallelism in Quibble, note that it requires a few patches still pending review: https://gerrit.wikimedia.org/r/#/c/integration/quibble/+/545661/1/quibble/commands.py

Krinkle's point about threaded output is important, and I don't know of any nice solutions. Python logging handlers multithread by locking per message, which will give us jumbled, interleaved output if both children are unbuffered. Personally, I agree and would prefer to err on this side rather than not showing output until after the child exits.

As a stopgap, the log format can be configured to include thread IDs. It will still be hard to read but allows a determined reader to disentangle the outputs.

As a stopgap, the log format can be configured to include thread IDs. It will still be hard to read but allows a determined reader to disentangle the outputs.

That sounds like a good temporary measure at least.

I don't plan to take on this task myself anytime soon but hopefully someone else will :)

Krinkle renamed this task from Quibble + PHPUnit: Run databaseless and database tests in parallel to Quibble: Run PHPUnit databaseless and database stages in parallel.Apr 9 2020, 1:09 AM

Change 587887 had a related patch set uploaded (by Awight; owner: Awight):
[integration/quibble@master] [WIP][POC] Parallelize phpunit-unit -databaseless, and -standalone

https://gerrit.wikimedia.org/r/587887

Change 587887 abandoned by Awight:
Parallelize phpunit-unit -databaseless, and -standalone

Reason:
Squashed into Ib2dc728980c as an example of how to use the parallelism wrapper.

https://gerrit.wikimedia.org/r/587887

Change 587885 had a related patch set uploaded (by Awight; owner: Awight):
[integration/quibble@master] Parallelism as a command object

https://gerrit.wikimedia.org/r/587885

I've parallelized the three databaseless PHPUnit suites in a patch for review, but didn't go as far as parallelizing phpunit-database yet, because it's not obvious to me how this should look. Here's the workflow at the moment:

  • Report package versions
  • Single-repository linters:
    • Zuul clone with parameters {"cache_dir": "ref", "projects": ["mediawiki/extensions/TwoColConflict"], "workers": 8, "workspace": "src", "zuul_project": "mediawiki/extensions/TwoColConflict"}
    • Run steps in parallel (concurrency=2):
      • composer test in src/extensions/TwoColConflict
      • npm test in src/extensions/TwoColConflict
  • Revert to git clean -xqdff in src
  • Integrated tests:
    • Zuul clone with parameters {"cache_dir": "ref", "projects": ["mediawiki/core", "mediawiki/extensions/TwoColConflict", "mediawiki/skins/Vector", "mediawiki/vendor"], "workers": 8, "workspace": "src", "zuul_pro↪ject": "mediawiki/extensions/TwoColConflict"}
    • Extension and skin submodule update under MediaWiki root src
    • Install MediaWiki, db=mysql db_dir=None vendor=True
    • Install composer dev-requires for vendor.git
    • npm install in src
    • PHPUnit short tests (concurrency=3):
      • PHPUnit unit tests
      • PHPUnit extensions suite (without database or standalone)
      • PHPUnit default standalone suite on extensions/TwoColConflict
    • Run Qunit tests
    • Browser tests using DISPLAY=:0, for projects mediawiki/extensions/TwoColConflict, mediawiki/core, mediawiki/skins/Vector, mediawiki/vendor
    • Run API-Testing
    • PHPUnit extensions suite (with database)

The way I see it, we can run all of the following in parallel. Maybe we expand this task's requirements?

  • phpunit-nodatabase-*
  • qunit
  • browser tests
  • api-testing
  • phpunit-database