Page MenuHomePhabricator

Jenkins: browser test host performance issue for timed builds
Closed, ResolvedPublic

Description

It seems that when we launch all the browser test builds that use headless Firefox at the same time, we are stressing the performance capabilities of the Jenkins host that supports all of those xvfb sessions.

The symptoms are most visible in a build like this one: https://integration.wikimedia.org/ci/view/BrowserTests/job/browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-firefox/

Note that the builds at the 3:45 and 6:43 marks are the ones kicked off automatically along with all the other browser test builds. The failures are due to errors like

unable to bind to locking port 7054 within 45 seconds/undefined method `close' for nil:NilClass (NoMethodError) (Firefox fails to start)

Also
unable to obtain stable firefox connection in 60 seconds (127.0.0.1:7055)

too many connection resets (due to Timeout::Error - Timeout::Error) after 286 requests on 23958180, last used 60.017253158 seconds ago

Builds started manually don't seem to have these kinds of problems launching the browser or connecting to it or getting responses.


Version: wmf-deployment
Severity: normal

Details

Reference
bz66449

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 3:16 AM
bzimport set Reference to bz66449.
bzimport added a subscriber: Unknown Object (MLST).

Chris: Is this still occuring for headless Fx after the throttling Antoine imposed?

from Chris on IRC:
"we tried headless firefox and brought the Jenkins host to its knees. we are 100% SauceLabs at this point"

My question is moot then.

For an unrelated change, I have eventually found some time to look at Xvfb and had a look at the headless ruby gem. In short: we have a race condition in mediawiki_selenium gems which cause the Xvfb on port 99 to be killed by another running in parallel.

The fix is to allocate a different display port or stop killing the xvfb :-D That is filled as Bug 71602 - mediawiki_selenium always use the same default xvfb display 99

Krinkle lowered the priority of this task from Medium to Low.Mar 2 2015, 2:54 PM
Krinkle removed a subscriber: Unknown Object (MLST).
hashar closed this task as Resolved.EditedMar 27 2015, 1:25 PM
hashar claimed this task.

The firefox being kicked out was due to mediawiki_selenium killing the shared XVFB session upon job completion. The gem now has support to keep the XVFB server around though we haven't adjusted our script yet (T73602).

We have solved the issue by migrating all the browser tests jobs on SauceLabs so XVFB is no more a concern.