Page MenuHomePhabricator

quibble-vendor-mysql-hhvm-docker for WikibaseCirrusSearch takes over 40 minutes
Open, Needs TriagePublic

Description

I've looked recently on how long it takes for a patch to pass CI, and I've noticed that it takes somewhere between 44 and 46 minutes for a WikibaseCirrusSearch patch to get through quibble-vendor-mysql-hhvm-docker. I think this is way too long a wait to validate a patch, and we should look into somehow reducing this time. This time cost is taken each time a patch is submitted (which sometimes requires several iterations, until all bugs, reviews and phpcs complaints are fixed) and waiting this long for each validation is IMHO inefficient. We should look at reducing these times.

Event Timeline

Smalyshev created this task.May 7 2019, 7:34 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 7 2019, 7:34 PM
greg added a subscriber: greg.May 8 2019, 6:42 PM

See also: T221434, which this might be a dupe of.

The general issue is: running our tests is taking a long time for a number of reasons (eg: no clear "integration" vs "unit" test delineations uniformly enforced) and we don't want that.

debt added a subscriber: debt.

Moving this to our #watching column for now :)

hashar added a subscriber: hashar.May 21 2019, 6:15 PM

Hey @Smalyshev , sorry I have delayed my reply to this request. Beside what Greg mentioned (we run every single tests from all the dependent extensions), there is another infrastructure related issue.

End of April, I have noticed the wmf-quibble-vendor-mysql-hhvm-docker (which is a different job and different set of repositories) was taking 40 minutes long. I have self filled/closed T222023 and just assumed it was a faulty WMCS instance and deleted it.

Later Kunal noticed that the Jenkins jobs to generate MediaWiki code coverage would sometime time out after 4 hours when it usually runs in two hours. The TLDR is that the oldest WMCS servers have bad CPU performances for some reason T223971. I have disabled the Jenkins instance running on those hosts.

I am suspecting the slow WikibaseCirrusSearch runs are related.

For the future: when a patchset is send the default is to run the HHVM based job. Then on Code-Review +2 run the PHP 7.0 - 7.2 jobs. Surely we should nowadays default to 7.2 which gives a faster feedback and move HHVM to just Code-Review+2.

I also think it's probably better to run regular patch set on 7.2 and do hhvm only on submits. Especially as we're migrating to 7.x in production. Would that speed things up or that's because of the old VMs?

OK, the main task is quibble-vendor-mysql-hhvm-docker; with recent changes, this is currently taking about 25 minutes in WikibaseCirrusSearch, which isn't great, but it's not as bad as when this was filed.

I'm not sure what improvements we can make ahead of the removal of HHVM from production (when this job will be removed entirely). We could switch the default "test" job from HHVM to PHP72 (we'd keep the HHVM job in the "submit" pipeline)?

Ok I am getting multiple builds taking 50+ minutes again for Wikibase, e.g.:

https://integration.wikimedia.org/ci/job/quibble-vendor-mysql-php73-docker/2752/
https://integration.wikimedia.org/ci/job/quibble-vendor-mysql-php72-docker/16913/
https://integration.wikimedia.org/ci/job/quibble-vendor-mysql-php70-docker/27655/

Overall job takes over 2 hours. And it needs to be done twice to submit a patch (and another two times if it needs to be backported) - so to do an urgent fix, I'd need to wait for several hours just for CI (and God help us if there's a bug in the patch and it needs to be amended). I don't think it's a good situation.