Page MenuHomePhabricator

selenium test for Wikibase is unstable
Closed, ResolvedPublicPRODUCTION ERROR

Description

I am getting a lot of intermittent failures on mwext-mw-selenium-composer-jessie for Wikibase, on patches that do not change anything relevant to it. E.g.: https://gerrit.wikimedia.org/r/c/417180/

Failures:
https://integration.wikimedia.org/ci/job/mwext-mw-selenium-composer-jessie/8933/console
https://integration.wikimedia.org/ci/job/mwext-mw-selenium-composer-jessie/8934/console
https://integration.wikimedia.org/ci/job/mwext-mw-selenium-composer-jessie/9145/console

and several more. If I recheck, it usually fixes itself after one or two runs, only to pop up again on the next check.

Event Timeline

Same happens with mediawiki-extensions-qunit-jessie.

greg subscribed.

It appears as that specific cucumber test is the flaky one ("Feature: Using time properties in statement"), at least based on the few I randomly looked at which failed.

I would ask the test author to attempt debugging first.

Looking at recent one: https://integration.wikimedia.org/ci/job/mwext-mw-selenium-composer-jessie/9457/console I see fails in various tests, all with timeouts. Looks like something more generic that a single faulty test... I'll try to see if I can notice anything in common between those failures.

Looks like most of the failure are in login timeouts, but there is no specific test causing them, i.e. the latest one is for "And Statement string value of claim 1 in group 1 should be 14 May 1985".

Change 425025 had a related patch set uploaded (by Hoo man; owner: Hoo man):
[mediawiki/extensions/Wikibase@master] Browser tests: Raise the timeout from 10s to 15s

https://gerrit.wikimedia.org/r/425025

Change 425025 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Browser tests: Raise the timeout from 10s to 15s

https://gerrit.wikimedia.org/r/425025

After raising the timeout from 10s to 15s, there was another failure:

https://integration.wikimedia.org/ci/job/mwext-mw-selenium-composer-jessie/9702/console
11:23:11 timed out after 15 seconds, Element was not visible in 15 seconds (Watir::Wait::TimeoutError)

So either 15s is still not enough, or we have some other issue here.

Change 425031 had a related patch set uploaded (by Hoo man; owner: Hoo man):
[mediawiki/extensions/Wikibase@master] [DNM] Test very high browser test timeout

https://gerrit.wikimedia.org/r/425031

With the (very high) 90s timeout, it took me quite some tries, but I managed to also hit this:

https://integration.wikimedia.org/ci/job/mwext-mw-selenium-composer-jessie/9709/console
12:59:03 timed out after 90 seconds, Element was not visible in 90 seconds (Watir::Wait::TimeoutError)

Change 425031 abandoned by Hoo man:
[DNM] Test very high browser test timeout

Reason:
So this didn't work (but now we at least know)

https://gerrit.wikimedia.org/r/425031

I've watched the video for the latest failure and something interesting is going on there - while the login is entered and no error message appears, the screen stays "not logged in" as before. So the issue here are not timeouts - the issue is that login does not happen, or the post-login page does not load correctly. May it be some kind of caching issue?

It's nearly been a month now - can we make this non-voting until it's fixed? Or move it to post-merge?

It's nearly been a month now - can we make this non-voting until it's fixed? Or move it to post-merge?

Let's.

Change 425929 had a related patch set uploaded (by Jforrester; owner: Jforrester):
[integration/config@master] [Wikibase] Temporarily remove flaky mwext-mw-selenium-composer-jessie from test

https://gerrit.wikimedia.org/r/425929

Change 425929 merged by jenkins-bot:
[integration/config@master] [Wikibase] Temporarily remove flaky mwext-mw-selenium-composer-jessie from test

https://gerrit.wikimedia.org/r/425929

Ok I +2'd the change to integration config. Someone with more knowledge of the problem will have to address the broken test.

Ruby Selenium framework has been deprecated (see Blog Post: Selenium Ruby framework deprecated). I will not have the time to take a look at this because I am working on Node.js Selenium framework improvements (see T190994: Q4 Selenium framework improvements). I would be glad to help you move tests from Ruby to Node.js (see T183160: Sample code in Node.js for repositories that still have Selenium+Ruby tests, T190687: Pair on writing Selenium tests in JavaScript/Node.js, T190046: Write Selenium tests in JavaScript/Node.js workshop).

In the video recording, we can see the test login as WikiAdmin, being redirected to the page but the login session is gone. That prevents the test from completing. That specific issue is most certainly a duplicate of T191537: a MediaWiki deferred update send a cookie to the browser which comes from a different session/user and it causes the session to be invalidated.

Possibly issues such as searching for an element that do not exist have the same root cause: the session get logged out and thus the page miss elements. Watching the videos for all such cases might link them to T191537 as well.

EDIT

Previously @Smalyshev noticed the log out as well:

I've watched the video for the latest failure and something interesting is going on there - while the login is entered and no error message appears, the screen stays "not logged in" as before. So the issue here are not timeouts - the issue is that login does not happen, or the post-login page does not load correctly. May it be some kind of caching issue?

So yeah, most probably T191537 will fix most of the flappiness :]

hashar assigned this task to Anomie.
hashar added a subscriber: Anomie.

As @Smalyshev , the test ends up being logged out at some point. I am pretty sure that is the same issue we have encountered on T191537 which is that cookie from a different session were sent by a background job executed when logging in. That invalidate the session and causes the logged out page issue.

Fixed by @Anomie in https://gerrit.wikimedia.org/r/439289 . He absolutely aced that nerd snip. Thank you Brad!

Change 457675 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/config@master] Add back Selenium tests for Wikibase

https://gerrit.wikimedia.org/r/457675

Change 457675 merged by jenkins-bot:
[integration/config@master] Add back Selenium tests for Wikibase

https://gerrit.wikimedia.org/r/457675

The selenium tests are triggered again on patchset proposals :]

mmodell changed the subtype of this task from "Task" to "Production Error".Aug 28 2019, 11:09 PM