Page MenuHomePhabricator

Flaky failure related to skin Minerva selenium tests
Closed, ResolvedPublicPRODUCTION ERROR

Description

See https://integration.wikimedia.org/ci/job/quibble-vendor-mysql-hhvm-docker/47489/console

22:17:27 [chrome #0-17] Session ID: 48eba3ac3c847c1924d218f998a22473
22:17:27 [chrome #0-17] Spec: /workspace/src/skins/MinervaNeue/tests/selenium/specs/search_loggedin.js
22:17:27 [chrome #0-17] Running: chrome
22:17:27 [chrome #0-17]
22:17:27 [chrome #0-17]   Search
22:17:27 [chrome #0-17]
22:17:27 [chrome #0-17]   Search
22:17:27 [chrome #0-17]       1) Clicking on a watchstar toggles the watchstar
22:17:27 [chrome #0-17]
22:17:27 [chrome #0-17]
22:17:27 [chrome #0-17] 1 failing (18s)
22:17:27 [chrome #0-17]
22:17:27 [chrome #0-17] 1) Search Clicking on a watchstar toggles the watchstar:
22:17:27 [chrome #0-17] An element could not be located on the page using the given search parameters (".watch-this-article").
22:17:27 [chrome #0-17] Error: An element could not be located on the page using the given search parameters (".watch-this-article").
22:17:27 [chrome #0-17]     at Context.it (skins/MinervaNeue/tests/selenium/specs/search_loggedin.js:27:3)
22:17:27 [chrome #0-17]     at Promise.F (node_modules/core-js/library/modules/_export.js:36:28)
22:17:27 [chrome #0-17]     at click() - at iClickASearchWatchstar (skins/MinervaNeue/tests/selenium/features/step_definitions/search_steps.js:26:19)
22:17:27 [chrome #0-17]
22:17:34 [21:17:34] [E] [MWBOT] Login failed: WikiAdmin@http://127.0.0.1:9412/
22:17:34 Unhandled rejection Error: Could not login: WrongToken
22:17:34     at request.then.then (/workspace/src/node_modules/mwbot/src/index.js:334:31)
22:17:34     at tryCatcher (/workspace/src/node_modules/bluebird/js/release/util.js:16:23)
22:17:34     at Promise._settlePromiseFromHandler (/workspace/src/node_modules/bluebird/js/release/promise.js:512:31)
22:17:34     at Promise._settlePromise (/workspace/src/node_modules/bluebird/js/release/promise.js:569:18)
22:17:34     at Promise._settlePromise0 (/workspace/src/node_modules/bluebird/js/release/promise.js:614:10)
22:17:34     at Promise._settlePromises (/workspace/src/node_modules/bluebird/js/release/promise.js:694:18)
22:17:34     at _drainQueueStep (/workspace/src/node_modules/bluebird/js/release/async.js:138:12)
22:17:34     at _drainQueue (/workspace/src/node_modules/bluebird/js/release/async.js:131:9)
22:17:34     at Async._drainQueues (/workspace/src/node_modules/bluebird/js/release/async.js:147:5)
22:17:34     at Immediate.Async.drainQueues (/workspace/src/node_modules/bluebird/js/release/async.js:17:14)
22:17:34     at runCallback (timers.js:672:20)
22:17:34     at tryOnImmediate (timers.js:645:5)
22:17:34     at processImmediate [as _immediateCallback] (timers.js:617:5)

Event Timeline

The token error is also happening in T221860.. it seems related to the number of tests and can only replicate in quibble so any help anyone can provide here would be super helpful. I worry it's going to continue to show up in any browser tests making API requests :(

Still affecting commits in core, and other extensions. Also at https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/PageTriage/+/508437/.

Please disable first and ask for help later if there isn't an obvious fix.

Right now I have little confidence disabling this one test will make the problem go away. It may be better to disable all the browser tests in package.json and let them just run in the daily jenkins job. is that possible?

Change 508484 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/skins/MinervaNeue@master] selenium: Disable wdio tests in regular CI

https://gerrit.wikimedia.org/r/508484

Change 508483 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/core@master] selenium: Disable Minerva wdio tests

https://gerrit.wikimedia.org/r/508483

Is that possible?

As far as I can tell, it worked fine before Minerva's tests were added, which suggests it is related to something it does. The untracked async calls which currently cause the before() steps to happen later, in the middle of unrelated test scenarios, stands out as prime suspect. I do not know if that is the only cause and how many test suites contain that mistake.

In any event, disabling wholesale is possible if you prefer. I've submitted a path to that end.

All the errors Ive seen are related to the mwbot library and token errors. The existing set of selenium tests do not seem to be exercising this library too much so my concern is that these bugs are going to continue as we add new tests due to a problem with using the mwbot library. It seems the methods in https://github.com/wikimedia/mediawiki/blob/135718b90478b94052a2575e60f38406366055e7/tests/selenium/wdio-mediawiki/Api.js were only being used in core before Minerva and MobileFrontend used and other extensions such as Cirrus use mwbot library directly without indirection.

I will review the options tomorrow and take care of this. I may need to submit some changes to core once I am able to replicate this problem locally.

Sorry I didn't look at this today. It slipped my mind.

I have not yet seen indication of a problem in mwbot or wdio. The problem is foremost that the Minerva tests introduced are failing to use asynchronous JavaScript correctly. As such, many async methods have been called without awaiting their returned promise. This is causing the test steps to be unpredictable and run in unordered fashion, which is expected to result in lots of flaky and confusing problems, like the ones we've found.

Change 508612 had a related patch set uploaded (by Jdlrobson; owner: Jdlrobson):
[mediawiki/skins/MinervaNeue@master] Skip the flaking watchstar test

https://gerrit.wikimedia.org/r/508612

Hmm, this one is strange. The test that is failing ("Clicking on a watchstar toggles the watchstar") looks like it actually passes as clearly the watch star it is looking for is in the screenshot:

image.png (480×396 px, 15 KB)

So token errors do not seem related to this particular failure and skipping running Minerva tests in the gated job (https://gerrit.wikimedia.org/r/#/c/mediawiki/skins/MinervaNeue/+/508484/ provided my understanding is correct in the commit message) or skipping this single test might be the best short term solution for now to prevent.

Once that's done, I will focus on trying to get to the bottom of the token errors and then finally restore this test.

Change 508484 abandoned by Krinkle:
selenium: Disable wdio tests in regular CI

https://gerrit.wikimedia.org/r/508484

Change 508483 merged by jenkins-bot:
[mediawiki/core@master] selenium: Disable Minerva wdio tests

https://gerrit.wikimedia.org/r/508483

Jdlrobson lowered the priority of this task from Unbreak Now! to High.May 7 2019, 8:15 PM

Change 508612 merged by jenkins-bot:
[mediawiki/skins/MinervaNeue@master] Skip the flaking watchstar test

https://gerrit.wikimedia.org/r/508612

mmodell changed the subtype of this task from "Task" to "Production Error".Aug 28 2019, 11:07 PM