MobileFrontend Chrome browser test job has become unstable
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Jdlrobson
	Jun 15 2017, 6:02 PM

Description

There have been lots of false positives/random failures on the MobileFrontend browser test job since 12th June. The tests that fail are not consistent:
https://integration.wikimedia.org/ci/view/Reading-Web/job/selenium-MobileFrontend/

This job is very important to us, so the added noise gives us great concern.

Has anything changed to the stack this week?

Related Objects

Mentioned Here: T152963: Increase in failures caused by Saucelabs

Event Timeline

Jdlrobson created this task.Jun 15 2017, 6:02 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 15 2017, 6:02 PM

Jdlrobson triaged this task as High priority.Jun 15 2017, 6:02 PM

Jdlrobson added a project: Release-Engineering-Team.

@Jdlrobson - lots of errors fail with The Sauce VMs failed to start the browser or device. For more info, please check https://wiki.saucelabs.com/display/DOCS/Common+Error+Messages - this is already tracked in T152963.

zeljkofilipin moved this task from Inbox to Ruby on the Browser-Tests-Infrastructure board.Jun 16 2017, 10:42 AM

As far as I know, nothing has changed recently. @hashar could know more.

I took a look at last 3 failed builds:

453 has 2 unexpected HTTP response (500) (MediawikiApi::HttpError) failures
454 has 2 Sauce could not start your job. For more information on what happened, please visit https://saucelabs.com/jobs/... (Selenium::WebDriver::Error::UnknownError) failures
455 has 1 unexpected HTTP response (503) (MediawikiApi::HttpError) failure

I do not think anything special is happening. Selenium::WebDriver::Error::UnknownError is tracked as T152963 and MediawikiApi::HttpError might be just a temporary problem with beta cluster.

unexpected HTTP response (503) (MediawikiApi::HttpError) that means the target wiki has thrown a 500 error. So most probably an issue on the beta cluster itself? Unfortunately mediawiki_selenium / mediawiki_api do not show the URL :(

By looking at the time of the error occurred, one can potentially find the error in logstash on https://logstash-beta.wmflabs.org/app/kibana.

Most probably beta had issues?

Looks like this is better? https://integration.wikimedia.org/ci/view/Reading-Web/job/selenium-MobileFrontend/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/

The last couple failures look, at first glance, to be real/not due to the above mentioned issues @zeljkofilipin identified.

greg changed the task status from Open to Stalled.Jul 7 2017, 10:55 PM

It's definitely improved. The error at https://integration.wikimedia.org/ci/view/Reading-Web/job/selenium-MobileFrontend/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/479/ however was not real.. probably related to slowdown on beta cluster. Has anything happened to improve beta cluster stability?

Jdlrobson moved this task from Incoming to Needs Prioritization on the Web-Team-Backlog board.Jul 7 2017, 11:04 PM

Generally, a single failure due to infrastructure instability for the past week seems pretty decent. Not perfect nor great but it's a virtualized environment :) I wish we could make Beta Cluster 100% stable, but... it can't be and also be a testing environment.

Beta Logstash around that time doesn't show anything obvious: https://logstash-beta.wmflabs.org/goto/c0328da778b655e76c1df8009e3ee82c (for the record, that was the Fatal Monitor view, but I remove the jobrunner noise from ORES....)

good enough ! :)

MBinder_WMF moved this task from Needs Prioritization to 2017-18 Q1 on the Web-Team-Backlog board.Mar 6 2018, 6:53 PM

MobileFrontend Chrome browser test job has become unstableClosed, ResolvedPublicActions

Description

Related Objects

Event Timeline

MobileFrontend Chrome browser test job has become unstable
Closed, ResolvedPublic
Actions