Page MenuHomePhabricator

mediawiki-core-code-coverage & mediawiki-core-code-coverage-php7 jobs lock up labs integration slaves
Closed, ResolvedPublic

Description

<addshore> !log killed https://integration.wikimedia.org/ci/job/mediawiki-core-code-coverage/3382/ and https://integration.wikimedia.org/ci/job/mediawiki-core-code-coverage-php7/143/ to unblock integration slaves
3:44 PM <stashbot> https://tools.wmflabs.org/stashbot/ Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL

These jobs take over an hour to run, and apparently 2 of them now run (an extra one for php7).
This in combination with the other types of jobs that run on these slaves (such as browser tests) can really lock up these slaves, and stop simple things like phplint running (which takes seconds).

We could:

  • Add more integration slaves here
  • Move some of this stuff to the docker slaves
  • Offset the coverage jobs so then run at different times / limit them to running 1 at a time?

Event Timeline

I didn't get a chance to finish investigating everything, but I think we can drop the php56 job, and rename the php7 job to drop the suffix and make that the "official" coverage.

Yesterday I have moved the phpunit-coverage-patch jobs to Nodepool ( Gerrit 419361). They were filling the workspaces of the Jessie permanent slaves, so that is at least a bit more room for them.

mediawiki-core-code-coverage and mediawiki-core-code-coverage-php7 we should move them to Docker containers and let them roam on DebianJessieDocker && m4executor. They are 4GB RAM instances with a single executor. We only have 4 of them though but it is easy to add a couple of them.

There is also T186489: Move debian-glue jobs to Docker using Stretch as a base image, though the debian-glue jobs are not a lot of builds.

Eventually, I would like to phase out integration-slave-jessie slaves. But that is going to take a while.

@Addshore wrote:

Offset the coverage jobs so then run at different times / limit them to running 1 at a time?

mediawiki-core-code-coverage triggers at:

triggers:
 - timed: '0 3,15 * * *'

So that is 3am and 3pm UTC. Maybe we can only run it once per night (3am) and drop the 3pm one.

hashar claimed this task.

I have added a mutex to only have one build of any of:

  • mwext-phpunit-coverage-patch
  • mwext-phpunit-coverage-publish
  • mediawiki-phpunit-coverage-patch

Done March 15th 2018 via d26ab287d7ea205e8c892d7897616552a21900a8