Page MenuHomePhabricator

mediawiki selenium should retry sauce labs requests on timeout
Closed, DeclinedPublic

Description

I finally looked a bit at mediawiki selenium gem code. I noticed a few jobs failing because of a timeout issue and it ends up being queries done to saucelabs.com

Relevant code:

lib/mediawiki_selenium/support/env.rb

def sauce_api(json)
RestClient::Request.execute(

:method => :put,
:url => "https://saucelabs.com/rest/v1/#{ENV['SAUCE_ONDEMAND_USERNAME']}/jobs/#{$session_id}",
:user => ENV["SAUCE_ONDEMAND_USERNAME"],
:password => ENV["SAUCE_ONDEMAND_ACCESS_KEY"],
:headers => {:content_type => "application/json"},
:payload => json

)
end

Since we often have timeout with sauce labs, I would catch the RestClient::RequestTimeout exception and retry the connection once.

Note RestClient::Request supports different timeout ( https://github.com/rest-client/rest-client/blob/master/lib/restclient/request.rb#L7 ):

:timeout and :open_timeout are how long to wait for a response and to

  1. open a connection, in seconds. Pass nil to disable the timeout.

I am not sure what are the defaults for Net::HTTP.


Version: wmf-deployment
Severity: normal

Details

Reference
bz70179

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 3:36 AM
bzimport set Reference to bz70179.
bzimport added a subscriber: Unknown Object (MLST).

After talking with Chris McMahon, this occurs quite rarely (found 5 such occurrences over the last 10 days). An example build is https://integration.wikimedia.org/ci/job/browsertests-Echo-test2.wikipedia.org-linux-chrome-sauce/9/console

00:13:16.483 When I am on the "Selenium Echo flyout test page" page # features/step_definitions/common_steps.rb:24
00:13:16.483 Then I have no new notifications # features/step_definitions/notifications_steps.rb:59
00:13:16.483 Request Timeout (RestClient::RequestTimeout)
00:13:16.483 gems/rest-client-1.7.2/lib/restclient/request.rb:427:in `rescue in transmit'
00:13:16.483 gems/rest-client-1.7.2/lib/restclient/request.rb:350:in `transmit'
00:13:16.483 gems/rest-client-1.7.2/lib/restclient/request.rb:176:in `execute'
00:13:16.483 gems/rest-client-1.7.2/lib/restclient/request.rb:41:in `execute'
00:13:16.483 gems/mediawiki_selenium-0.3.2/lib/mediawiki_selenium/support/env.rb:80:in `sauce_api'
00:13:16.483 gems/mediawiki_selenium-0.3.2/lib/mediawiki_selenium/support/hooks.rb:84:in `block in <top (required)>'

Maybe we can at least yield a better error message, stating that we time-out connecting to SauceLabs? The trace above does not make it obvious.

This looks like a network failure on our side.

The test is attempting to retrieve the report information from SauceLabs but there is no connection.

Just below that the test tries to send an IRC notice but that also fails:

03:19:16 IRC notifier plugin: Sending notification to: #wikimedia-qa
03:19:16 IRC notifier plugin: [ERROR] not connected. Cannot send message to '#wikimedia-qa'