It seems that when we launch all the browser test builds that use headless Firefox at the same time, we are stressing the performance capabilities of the Jenkins host that supports all of those xvfb sessions.
The symptoms are most visible in a build like this one: https://integration.wikimedia.org/ci/view/BrowserTests/job/browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-firefox/
Note that the builds at the 3:45 and 6:43 marks are the ones kicked off automatically along with all the other browser test builds. The failures are due to errors like
unable to bind to locking port 7054 within 45 seconds/undefined method `close' for nil:NilClass (NoMethodError) (Firefox fails to start)
Also
unable to obtain stable firefox connection in 60 seconds (127.0.0.1:7055)
too many connection resets (due to Timeout::Error - Timeout::Error) after 286 requests on 23958180, last used 60.017253158 seconds ago
Builds started manually don't seem to have these kinds of problems launching the browser or connecting to it or getting responses.
Version: wmf-deployment
Severity: normal