Page MenuHomePhabricator

browsertest jobs should not be allowed to run for 10 hours
Closed, ResolvedPublic

Description

This job and a dozen others were stuck from 4am to 2pm (UTC) today.
https://integration.wikimedia.org/ci/job/browsertests-VisualEditor-en.wikipedia.beta.wmflabs.org-os_x_10.9-safari-sauce/364/

All jobs should have a hard limit on their run time. By default this is 30 minutes in JJB.

Details

Related Gerrit Patches:

Event Timeline

Krinkle created this task.Mar 10 2015, 2:52 PM
Krinkle raised the priority of this task from to Needs Triage.
Krinkle updated the task description. (Show Details)
Krinkle added a subscriber: Krinkle.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 10 2015, 2:52 PM
Krinkle set Security to None.Mar 10 2015, 2:52 PM
Krinkle edited subscribers, added: hashar, Cmcmahon, zeljkofilipin; removed: Krinkle.

30 minutes is not enough for some browser test jobs. Either the jobs should be split into smaller ones, or larger limit should be set for some jobs.

The min Wikidata browsertest suite takes 3h at the moment.
See https://integration.wikimedia.org/ci/view/BrowserTests/view/Wikidata/job/browsertests-Wikidata-WikidataTests-linux-firefox-sauce/
Mostly because Saucelabs is damn slow. Running them on PhantomJS locally the same test suite takes ~30 mins.

Can we look at the reason for the slowness? Maybe we are doing a lot of useless network roundtrips which could be optimized?

30 minutes is not enough for some browser test jobs. Either the jobs should be split into smaller ones, or larger limit should be set for some jobs.

Any number of hours that is not unlimited (and not higher than 10) is fine by me. I don't mind it that it locks up a slave for several hours if it crashes like today. As long as it doesn't require manual intervention to fix (e.g. aborting the build). So if it times out after 9 hours that would be a good start.

If you're comfortable setting a lower limit of say 3 hours, that's cool too.

The three longest running browser test jobs are:

  • browsertests-Wikidata-WikidataTests-linux-firefox-sauce, 2 hr 45 min
  • browsertests-VisualEditor-production-linux-firefox-sauce, 1 hr 28 min
  • browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-os_x_10.9-safari-sauce, 1 hr 6 min

I think it is safe to set the limit to 3 hours for all browser test jobs. Anybody opposed?

greg added a subscriber: greg.Mar 12 2015, 4:15 PM

I think it is safe to set the limit to 3 hours for all browser test jobs. Anybody opposed?

doit, we can adjust (up/down) later if it's not working.

Can we look at the reason for the slowness? Maybe we are doing a lot of useless network roundtrips which could be optimized?

Let's split that off to another task.

Can we look at the reason for the slowness? Maybe we are doing a lot of useless network roundtrips which could be optimized?

Sauce Labs is slow by design. It is driving a browser in the cloud from a local machine.

I think it is safe to set the limit to 3 hours for all browser test jobs. Anybody opposed?

doit, we can adjust (up/down) later if it's not working.

I am taking the ticket. I will not have the time for it today, but it should be done tomorrow. If anybody wants to work on this, feel free to assign the task to you. :)

zeljkofilipin triaged this task as Medium priority.Mar 16 2015, 8:43 AM

Change 197008 had a related patch set uploaded (by Zfilipin):
Abort browsertests* jobs if they do not complete in 3 hours

https://gerrit.wikimedia.org/r/197008

zeljkofilipin closed this task as Resolved.Mar 16 2015, 9:44 AM

Change 197008 merged by jenkins-bot:
Abort browsertests* jobs if they do not complete in 3 hours

https://gerrit.wikimedia.org/r/197008

All browsertests* Jenkins jobs are updated.

greg moved this task from INBOX to Done on the Release-Engineering-Team board.Mar 16 2015, 3:42 PM
Tobi_WMDE_SW reopened this task as Open.Mar 23 2015, 3:19 PM

Reopening because the limit is too low for the Wikidata build:
https://integration.wikimedia.org/ci/view/BrowserTests/view/Wikidata/job/browsertests-Wikidata-WikidataTests-linux-firefox-sauce/
The builds take more than 3 hours.

Oops, sorry about that. I am looking at build time trend and it is obvious than more than 3 hours are needed. I will bump the limit to 4 hours, looks like that should be enough for now.

Change 199919 had a related patch set uploaded (by Zfilipin):
Abort browsertests* jobs if they do not complete in 4 hours

https://gerrit.wikimedia.org/r/199919

zeljkofilipin closed this task as Resolved.Mar 26 2015, 4:00 PM

Change 200129 had a related patch set uploaded (by Hashar):
Support per browsertest job timeout

https://gerrit.wikimedia.org/r/200129

Change 200129 merged by jenkins-bot:
Support per browsertest job timeout

https://gerrit.wikimedia.org/r/200129

Change 199919 merged by jenkins-bot:
Limit WikidataTests browser test to four hours

https://gerrit.wikimedia.org/r/199919

@hashar & @zeljkofilipin great thanks to you for fixing this!