Page MenuHomePhabricator

[4 hrs] MinervaNeue browser test are flaking (waiting for {:class=>"mw-notification", :tag_name=>"div"} to become present )
Closed, ResolvedPublic

Description

Browser tests e.g.[[ https://integration.wikimedia.org/ci/job/selenium-MinervaNeue/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/13/ | job 13 ]]are failing occasionally:

timed out after 10 seconds, waiting for {:class=>"mw-notification", :tag_name=>"div"} to become present (Watir::Wait::TimeoutError)
/mnt/home/jenkins-deploy/.gem/2.1.0/gems/watir-webdriver-0.9.1/lib/watir-webdriver/wait.rb:44:in `until'
/mnt/home/jenkins-deploy/.gem/2.1.0/gems/watir-webdriver-0.9.1/lib/watir-webdriver/wait.rb:223:in `wait_until_present'
/mnt/home/jenkins-deploy/.gem/2.1.0/gems/page-object-1.1.0/lib/page-object/platforms/watir_webdriver/element.rb:160:in `when_present'
/srv/jenkins-workspace/workspace/selenium-MinervaNeue/BROWSER/chrome/MEDIAWIKI_ENVIRONMENT/beta/PLATFORM/Linux/label/BrowserTests/tests/browser/features/step_definitions/common_article_steps.rb:30:in `block in <top (required)>'

Browser tests tend to fail on the step I should see a toast notification
When you watch the associated videos the toast is visible but for some reason not matching the step.
It's possible the toast disappears before it can be checked or the selector is wrong.
This didn't used to be an issue. :/

Event Timeline

Jdlrobson triaged this task as Medium priority.Jul 17 2017, 9:41 PM
Jdlrobson moved this task from Backlog to Bugs on the MinervaNeue board.
Jdlrobson added a project: Web-Team-Backlog.
Jdlrobson moved this task from Incoming to Needs Prioritization on the Web-Team-Backlog board.

Need to debug what's going on here before committing to this.

Jdlrobson renamed this task from MinervaNeue browser test is flaking to MinervaNeue browser test are flaking (waiting for {:class=>"mw-notification", :tag_name=>"div"} to become present ).Jul 20 2017, 10:17 PM
Jdlrobson raised the priority of this task from Medium to High.

This is happening more and more often...

Change 368114 had a related patch set uploaded (by Jdlrobson; owner: Jdlrobson):
[mediawiki/skins/MinervaNeue@master] Hygiene: Toast selector should be more flexible

https://gerrit.wikimedia.org/r/368114

I have a theory that I'm exploring:
I'm interested if Jenkins fails for https://gerrit.wikimedia.org/r/#/c/366482/ when a default toast has another class...
If so, I think this should be fixed by https://gerrit.wikimedia.org/r/368114

Change 368114 abandoned by Jdlrobson:
Hygiene: Toast selector should be more flexible

Reason:
Well that proves it is not cause..

https://gerrit.wikimedia.org/r/368114

Summarising what I see:

In #36

  • tests/browser/features/editor_wikitext_saving.feature:28:in `Then I should see a toast notification' fails.
    • I do not see a toast in the browser test so this is probably the result of a failed edit. Language button seems obscured by something...

In build 35

  • tests/browser/features/editor_wikitext_saving.feature:16:in `Then I should see a toast notification' fails.
    • At 55s I see a toast notification..

In build 34 there are 3 failures:

  • tests/browser/features/language.feature:30:in `Then I should see a toast with message "This page is not available in other languages."'
    • at 14s in the first video play I can clearly see the toast. It seems to be testing the text of a hidden version.
  • tests/browser/features/editor_wikitext_saving.feature:41:in `Then the text of the first heading should be "Selenium wikitext editor test"'
    • The save doesn't seem to work and this is probably unrelated to this problem. "I say OK in the confirm dialog" step doesn't seem to be working here.
  • tests/browser/features/watchstar.feature:16:in `Then I should see a toast with message "Removed Selenium mobile watch test from your watchlist"'
    • but I see the toast at 26s

https://gerrit.wikimedia.org/r/368942 QA: Minor tweaks to how we check toast notifications might help this as it seems the toast message is being tested too early.

#40 and #41 failed with:
expected "" to match "This page is not available in other languages." (RSpec::Expectations::ExpectationNotMetError)
expected "" to match "Removed Selenium mobile watch test from your watchlist" (RSpec::Expectations::ExpectationNotMetError)

#42 failed with:
expected "" to match "This page is not available in other languages." (RSpec::Expectations::ExpectationNotMetError)

  • it's there

timed out after 5 seconds, Element was not visible in 5 seconds (Watir::Wait::TimeoutError)

  • it's not there.

timed out after 5 seconds, waiting for {:class=>"mw-notification", :tag_name=>"div"} to become present (Watir::Wait::TimeoutError)

  • The watch star nor the toast appear.

The expected "" to match match can be updated to ignore empty strings. That will hopefully leave us with the legit failures.. although why that is happening I'm not sure!

Change 369450 had a related patch set uploaded (by Jdlrobson; owner: Jdlrobson):
[mediawiki/skins/MinervaNeue@master] QA: Make sure toast is not empty before checking text value

https://gerrit.wikimedia.org/r/369450

Change 369450 merged by jenkins-bot:
[mediawiki/skins/MinervaNeue@master] QA: Make sure toast is not empty before checking text value

https://gerrit.wikimedia.org/r/369450

Change 369561 had a related patch set uploaded (by Jdlrobson; owner: Jdlrobson):
[mediawiki/skins/MinervaNeue@master] QA: Use css rather than class for finding toast

https://gerrit.wikimedia.org/r/369561

Jdlrobson added a subscriber: zeljkofilipin.

Not having much luck getting to bottom of this but we should find out what's going on as T172236 almost sneaked into a release.
@zeljkofilipin any chance of your fresh eyes on the problem here and how we can improve the Ruby tests?

@Jdlrobson sorry, I am going on vacation tomorrow. I can take a look when I get back, in a week.

phuedx renamed this task from MinervaNeue browser test are flaking (waiting for {:class=>"mw-notification", :tag_name=>"div"} to become present ) to [4 hrs] MinervaNeue browser test are flaking (waiting for {:class=>"mw-notification", :tag_name=>"div"} to become present ).Aug 7 2017, 5:14 PM
phuedx added a project: Spike.

I've tried this locally and using the beta labs URL with different Chrome versions (including the one in screenshot, version 48), but no luck re-producing the issue.

I can replicate this on a very slow connection.

Replication steps:

Reason:

  • Check mw.loader.getState('mediawiki.notification') in console - at time of clicking the watchstar it can be registered meaning the display of the toast is delayed. Inspecting the mediawiki.notify module - that asynchronously loads mediawiki.notification

@Jdlrobson does your analysis suggest that the integration server is also on a 2G speed?

Change 369561 merged by jenkins-bot:
[mediawiki/skins/MinervaNeue@master] QA: Adjustments to account for slow loading of mediawiki.notifications

https://gerrit.wikimedia.org/r/369561

@Jdlrobson does your analysis suggest that the integration server is also on a 2G speed?

No, my point was that if the server is slow/unreliable (as the beta cluster is) - then shipping assets will be slower. I was seeing this reflected in the videos - the page hadn't finished loading when steps like clicking the watchstar are carried out. Using 2G was a reliable way to reproduce this issue.

I setup this job to test my change before merging - https://integration.wikimedia.org/ci/job/selenium-MinervaNeue-BugFix/ - and it was passing consistently with https://gerrit.wikimedia.org/r/369561. After merging you can see the job is green again. I've documented how I troubleshooted this for future reference - https://www.mediawiki.org/wiki/Reading/Web/QA#Troubleshooting_.22flakey_tests.22 .

The Firefox job of https://integration.wikimedia.org/ci/job/selenium-MinervaNeue/ is now passing, but the Chrome browser test is still flakey (see 55 and 56). The issue with that is unrelated to this task and I'm debugging separately.

I've opened T172835 to explain the remaining problem

This task doesn't have a signerer offerer. Who's gonna do that?

Code looks ok, I don't see new failed messages in CI. I would like to keep this task in ready for sign off column for one extra day, if tomorrow build is still green I'll sign it off.

There were some stability issues on the beta cluster weekend runs (signified by the Uh-oh errors).

Last two builds are green again.
There's definitely not passing 100% of the time, but they are definitely being a bit more reliable.
Hopefully @zeljkofilipin will be able to take a look and see if there's anyway to improve things here. The root of the problem seems to be the timeout. The toast notification only shows for 5s and sometimes the step I should see a toast notification is running after it has been displayed rather than while it is displayed. I'm not sure if there's a way to rewrite our existing Ruby rules to more reliably check for this.

Change 373085 had a related patch set uploaded (by Jdlrobson; owner: Jdlrobson):
[mediawiki/skins/MinervaNeue@master] QA: Do not run various toast notification browser tests on BC

https://gerrit.wikimedia.org/r/373085

Change 373085 merged by jenkins-bot:
[mediawiki/skins/MinervaNeue@master] QA: Do not run various toast notification browser tests on BC

https://gerrit.wikimedia.org/r/373085

Change 473939 had a related patch set uploaded (by Jdlrobson; owner: Jdlrobson):
[mediawiki/skins/MinervaNeue@master] QA: Remove hacks introduced for T170890

https://gerrit.wikimedia.org/r/473939

Change 473939 merged by jenkins-bot:
[mediawiki/skins/MinervaNeue@master] QA: Remove hacks introduced for T170890

https://gerrit.wikimedia.org/r/473939