Page MenuHomePhabricator

browsertest jobs should not be allowed to run for 10 hours
Closed, ResolvedPublic

Description

This job and a dozen others were stuck from 4am to 2pm (UTC) today.
https://integration.wikimedia.org/ci/job/browsertests-VisualEditor-en.wikipedia.beta.wmflabs.org-os_x_10.9-safari-sauce/364/

All jobs should have a hard limit on their run time. By default this is 30 minutes in JJB.

Event Timeline

Krinkle raised the priority of this task from to Needs Triage.
Krinkle updated the task description. (Show Details)
Krinkle subscribed.
Krinkle edited subscribers, added: hashar, Cmcmahon, zeljkofilipin; removed: Krinkle.

30 minutes is not enough for some browser test jobs. Either the jobs should be split into smaller ones, or larger limit should be set for some jobs.

The min Wikidata browsertest suite takes 3h at the moment.
See https://integration.wikimedia.org/ci/view/BrowserTests/view/Wikidata/job/browsertests-Wikidata-WikidataTests-linux-firefox-sauce/
Mostly because Saucelabs is damn slow. Running them on PhantomJS locally the same test suite takes ~30 mins.

Can we look at the reason for the slowness? Maybe we are doing a lot of useless network roundtrips which could be optimized?

30 minutes is not enough for some browser test jobs. Either the jobs should be split into smaller ones, or larger limit should be set for some jobs.

Any number of hours that is not unlimited (and not higher than 10) is fine by me. I don't mind it that it locks up a slave for several hours if it crashes like today. As long as it doesn't require manual intervention to fix (e.g. aborting the build). So if it times out after 9 hours that would be a good start.

If you're comfortable setting a lower limit of say 3 hours, that's cool too.

The three longest running browser test jobs are:

  • browsertests-Wikidata-WikidataTests-linux-firefox-sauce, 2 hr 45 min
  • browsertests-VisualEditor-production-linux-firefox-sauce, 1 hr 28 min
  • browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-os_x_10.9-safari-sauce, 1 hr 6 min

I think it is safe to set the limit to 3 hours for all browser test jobs. Anybody opposed?

I think it is safe to set the limit to 3 hours for all browser test jobs. Anybody opposed?

doit, we can adjust (up/down) later if it's not working.

Can we look at the reason for the slowness? Maybe we are doing a lot of useless network roundtrips which could be optimized?

Let's split that off to another task.

Can we look at the reason for the slowness? Maybe we are doing a lot of useless network roundtrips which could be optimized?

Sauce Labs is slow by design. It is driving a browser in the cloud from a local machine.

I think it is safe to set the limit to 3 hours for all browser test jobs. Anybody opposed?

doit, we can adjust (up/down) later if it's not working.

I am taking the ticket. I will not have the time for it today, but it should be done tomorrow. If anybody wants to work on this, feel free to assign the task to you. :)

Change 197008 had a related patch set uploaded (by Zfilipin):
Abort browsertests* jobs if they do not complete in 3 hours

https://gerrit.wikimedia.org/r/197008

Change 197008 merged by jenkins-bot:
Abort browsertests* jobs if they do not complete in 3 hours

https://gerrit.wikimedia.org/r/197008

All browsertests* Jenkins jobs are updated.

Oops, sorry about that. I am looking at build time trend and it is obvious than more than 3 hours are needed. I will bump the limit to 4 hours, looks like that should be enough for now.

Change 199919 had a related patch set uploaded (by Zfilipin):
Abort browsertests* jobs if they do not complete in 4 hours

https://gerrit.wikimedia.org/r/199919

Change 200129 had a related patch set uploaded (by Hashar):
Support per browsertest job timeout

https://gerrit.wikimedia.org/r/200129

Change 200129 merged by jenkins-bot:
Support per browsertest job timeout

https://gerrit.wikimedia.org/r/200129

Change 199919 merged by jenkins-bot:
Limit WikidataTests browser test to four hours

https://gerrit.wikimedia.org/r/199919