Page MenuHomePhabricator

mediawiki_selenium always use the same default xvfb display 99
Closed, DeclinedPublic

Description

Our firefox browser tests running on local instances kept complaining of losing connection to the firefox web driver after 45 seconds. We never really managed to figure out the issue and went with throttling the browsertests to ensure that only one run per instance ( https://gerrit.wikimedia.org/r/#/c/155337/ ).

Timo proposed a patch to add chromium and a xvfb server on the contint slaves which propose to maintain an xvfb service on display port 99. That is what triggered the race condition warning to me.

In mediawiki_selenium we use ruby gems headless defaults:

require "headless"
headless = Headless.new
headless.start

Then destroy it:

at_exit do
  headless.destroy
end

Looking at headless code ( https://github.com/leonid-shevtsov/headless/blob/master/lib/headless.rb ) the default are:

  • display port 99
  • auto picking disabled (since display port is set)
  • reuses display

When the first test start, that boot an xvfb on display port 99. A second test starting in parallel will end up reusing the same display. Imho that would cause a bunch of race conditions between the two browsers being run.

mediawiki_selenium should probably be invoked with:

Headless.new(autopick: true, reuse: false)

We could have that tweakable via env variable such as HEADLESS_DISPLAY, HEADLESS_AUTOPICK and HEADLESS_REUSE.


Version: wmf-deployment
Severity: normal

Details

Reference
bz71602

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 3:59 AM
bzimport set Reference to bz71602.

I can repro the reported behavior consistently in MWV using the following feature/steps.

Feature: Parallel Tests

  Scenario: Process 1
    When I wait 10 seconds
    Then the browser connection should still be alive

  Scenario: Process 2
    When I wait 2 seconds
    Then the browser connection should still be alive


Given(/^I wait (\d+) seconds$/) do |seconds|
  sleep seconds.to_i
end

Then(/^the browser connection should still be alive$/) do
  expect(@browser.execute_script("return true")).to be(true)
end

And by executing the scenarios in parallel.

shell 1$ HEADLESS=true bundle exec cucumber features/test.feature:3
shell 2$ HEADLESS=true bundle exec cucumber features/test.feature:7

Change 164704 had a related patch set uploaded by Dduvall:
Additional headless environment variables

https://gerrit.wikimedia.org/r/164704

I've submitted a patch that will allow you to provide additional environment variables.

Note that simply telling Headless to keep the xvfb process running (destroy_at_exit: false) was enough for my tests to pass. In other words, it appears that the race condition at fault is xvfb shutdown, not a conflict of the display port. However, in case you run into issues screenshot-ing, etc. when using the same display port for multiple test runners, I've also provided an environment variable binding for that.

(In reply to Antoine "hashar" Musso from comment #0)

Timo proposed a patch to add chromium and a xvfb server on the contint
slaves which propose to maintain an xvfb service on display port 99. That
is what triggered the race condition warning to me.

For the record, my patch uses Xvdb server number 94, not 99. This suspicious may have helped tracked it down, but my deployment did not affect browser tests.

Change 164704 merged by jenkins-bot:
Additional headless environment variables

https://gerrit.wikimedia.org/r/164704

The patch has been merged and is included in mediawiki_selenium 0.4.1.

We have to adjust all repositories to require 0.4.1 and update the Jenkins jobs to set the proper variables. Most probably:

export HEADLESS=true
export HEADLESS_DESTROY_AT_EXIT=false
bundle exec cucumber

Amire80 removed a subscriber: Amire80.Mar 25 2015, 1:52 PM

@dduvall, @hashar: is this still the problem?

hashar lowered the priority of this task from High to Low.Mar 27 2015, 1:06 PM

Dan added support to fix the issue with mediawiki_selenium 0.4.1. Per my comment on T73602#774867 we have to update the job templates and adjust the env variables being set.

It is not much of a problem right now since we are using SauceLabs, but will whenever we run tests against a local MediaWiki. So I would prefer we fix the variables before hitting the same wall later on.

Changing priority to low since we are not impacted (jobs are in SauceLabs).

@dduvall: the task is assigned to you, are you working on this?

hashar removed dduvall as the assignee of this task.Mar 30 2015, 4:29 PM

I have removed @dduvall from assignee since he implemented the mediawiki_selenium part but we have yet to adjust all repository to use the new ENV variables.

zeljkofilipin closed this task as Resolved.EditedJun 16 2015, 3:46 PM
zeljkofilipin claimed this task.

This is not a problem any more. If there is something left to do here, please reopen or create a new task.

hashar reopened this task as Open.Jul 1 2015, 9:12 PM

Reopening, that is actually a problem when we have multiple browser tests running on the same instance hitting a local mediawiki (i.e. no sauce labs). That is what @dduvall is currently sprinting on, resurrecting the old project we started and prompted the filling of that bug.

zeljkofilipin removed zeljkofilipin as the assignee of this task.Jul 2 2015, 3:11 PM
zeljkofilipin removed a subscriber: Cmcmahon.
dduvall closed this task as Declined.Jul 7 2015, 3:46 PM
dduvall claimed this task.

There's an outstanding upstream bug which prevents us from reusing a display started by another user, but the maintainer has not responded yet.

The current workaround is to let headless start a new isolated Xvfb display between 70-90 for the current build, which requires more memory but is actually preferable to a shared display if we want to record video of the session, take screenshots, etc. Basically, this issue is now moot.

I looked at your solution and agree assigning the displays ourselves is better than relying on upstream unknown spaghetti code :-) Thanks!