Page MenuHomePhabricator

Flaky wdio test: MinervaNeue/history_steps "Page diff Added and removed content"
Closed, ResolvedPublicPRODUCTION ERROR

Description

https://integration.wikimedia.org/ci/job/wmf-quibble-core-vendor-mysql-hhvm-docker/20209/console

18:33:16 1) Page diff Added and removed content:
18:33:16 Input A expected to strictly equal input B:
18:33:16 + expected - actual
18:33:16 
18:33:16 - false
18:33:16 + true
18:33:16 running chrome
18:33:16 AssertionError [ERR_ASSERTION]: Input A expected to strictly equal input B:
18:33:16 + expected - actual
18:33:16 
18:33:16 - false
18:33:16 + true
18:33:16     at iOpenTheLatestDiff (/workspace/src/skins/MinervaNeue/tests/selenium/features/step_definitions/history_steps.js:14:9)
18:33:16     at Context.it (/workspace/src/skins/MinervaNeue/tests/selenium/specs/diff.js:25:3)
18:33:16     at new Promise (<anonymous>)
18:33:16     at new F (/workspace/src/skins/MinervaNeue/node_modules/core-js/library/modules/_export.js:36:28)
Patches affected

Event Timeline

xSavitar updated the task description. (Show Details)
Krinkle renamed this task from AssertionError [ERR_ASSERTION]: Input A expected to strictly equal input B to Flaky wdio test: MinervaNeue/history_steps "Page diff Added and removed content".Jul 25 2019, 5:37 PM
Krinkle triaged this task as Unbreak Now! priority.

(And also all other patch sets in the last 20 minutes for mediawiki/core.)

Change 525611 had a related patch set uploaded (by Niedzielski; owner: Stephen Niedzielski):
[mediawiki/skins/MinervaNeue@master] [test] remove "Page diff Added and removed content" test

https://gerrit.wikimedia.org/r/525611

Change 525618 had a related patch set uploaded (by Jdlrobson; owner: Jdlrobson):
[mediawiki/skins/MinervaNeue@master] Disable diff test due to issue with RunJobs

https://gerrit.wikimedia.org/r/525618

It looks like this is getting stuck in the Runjobs step:

10:31:24 RunJobs through requests to the front page (run 1).
10:31:24 RunJobs detected 54 more queued job(s).
10:31:24 RunJobs through requests to the front page (run 2).
10:31:24 RunJobs detected 54 more queued job(s).
10:31:24 RunJobs through requests to the front page (run 3).
10:31:24 RunJobs detected 54 more queued job(s).
10:31:24 RunJobs through requests to the front page (run 4).
10:31:25 RunJobs detected 54 more queued job(s).
10:31:25 RunJobs through requests to the front page (run 5).
10:31:25 RunJobs detected 54 more queued job(s).
10:31:25 RunJobs through requests to the front page (run 6).
10:31:25 RunJobs detected 54 more queued job(s).
10:31:25 RunJobs through requests to the front page (run 7).
10:31:25 RunJobs detected 54 more queued job(s).
10:31:25 RunJobs through requests to the front page (run 8).
10:31:25 RunJobs detected 54 more queued job(s).
10:31:25 RunJobs through requests to the front page (run 9).
10:31:25 RunJobs detected 54 more queued job(s).
10:31:25 RunJobs through requests to the front page (run 10).
10:31:25 RunJobs detected 54 more queued job(s).
10:31:25 RunJobs stopping requests to the front page due to reached limit.
	browser.call( () => RunJobs.run() );

As a result this call never happens:

	ArticlePage.open( pageTitle );

which breaks the rest of the test.

I don't know about the internals of RunJobs but that seems to be the problem here. We can disable this for the time being, but this issue is likely to resurface elsewhere and further put off people from writing Selenium tests unless addressed, so I recommend keeping this open and debugging some more.

Who maintains wdio-mediawiki/RunJobs ?

Change 525611 abandoned by Niedzielski:
[test] remove "Page diff Added and removed content" test

https://gerrit.wikimedia.org/r/525611

wdio-mediawiki/RunJobs is a wrapper around refreshing the wiki's main page (introduced by WMDE for rollback tests). It hasn't changed recently. The bug is more likely to do with the way actual jobs are run in MW core and/or deferred updates queued by the feature under test.

What part of the mobile diff depends on an async job having completed?

Change 525618 merged by jenkins-bot:
[mediawiki/skins/MinervaNeue@master] Disable diff test due to issue with RunJobs

https://gerrit.wikimedia.org/r/525618

Jdlrobson lowered the priority of this task from Unbreak Now! to High.Jul 25 2019, 7:33 PM

What part of the mobile diff depends on an async job having completed?

The mobile diff page browser test makes 3 consecutive edits to the page via the API. I don't think this needs RunJobs (so have tried removing it) but there is another place where we use RunJobs to wait for an edit which adds categories to the page.

Annoyingly when a test fails we only get a screenshot not a video which makes debugging near impossible for me - however what I'm seeing is the test fails as it's on the MainPage, which is not the page the user should be on at this point - page should match Selenium_diff_test_<random number>

https://gerrit.wikimedia.org/r/#/c/mediawiki/skins/MinervaNeue/+/525633 Revert "Disable diff test due to issue with RunJobs"

Hi, what is the action you'd like us to take here?

Just FYI, @zeljkofilipin is on vacation until Aug 9th.

Change 525814 had a related patch set uploaded (by Krinkle; owner: Jdlrobson):
[mediawiki/skins/MinervaNeue@wmf/1.34.0-wmf.15] Disable diff test due to issue with RunJobs

https://gerrit.wikimedia.org/r/525814

Change 525814 merged by Krinkle:
[mediawiki/skins/MinervaNeue@wmf/1.34.0-wmf.15] Disable diff test due to issue with RunJobs

https://gerrit.wikimedia.org/r/525814

Is this resolved by Krinkle's last merged patch? If not, could someone please answer T229031#5367511 ("what is the action you'd like RelEng to take here")? TIA

The test got disabled. We don't know how to fix it. So the bug is now about restoring that test.

We can't work out why this is not working so we could use some help from a selenium expert. Weve hit many issues with tests passing locally but failing in the ci builds since porting to Node.js and they have put my team off writing browser tests so I'm keen to work with RelEng to work out the root cause and make these more reliable.

Annoyingly when a test fails we only get a screenshot not a video

As far as I remember, the framework does not record videos in before and after hooks, only in the tests themselves. If there's a failure in before hook, wdio will automatically take a screenshot, but there will be no video.

mmodell changed the subtype of this task from "Task" to "Production Error".Aug 28 2019, 11:06 PM

Haven't seen this flake in a long time.