Page MenuHomePhabricator

Make existing daily selenium nodejs tests for WikibaseLexeme green again
Closed, ResolvedPublic

Description

Issue

We are running the WikibaseLexeme's selenium nodejs browser tests against beta every day[0] and they currently fail every day.

This leads to an email about failing tests being sent out every day which is habitually ignored by everybody. This is not good as it leads to similar emails about newly failed tests being overlooked as well.

[0]: apparently they have been added manually, see: T188742, possibly by T194252

Proposed solution

Try to keep the same level of functionality tested as before but make the tests reliable and green.

If the issue turns out to be significantly complex (e.g. because of the issues in the testing infrastructure), the task should probably be tackled using different approach (make it a subject of a dedicated "project" - hike in WMDE vocabulary).

Acceptance criteria

  • Functionality covered by existing tests is still covered
  • Tests are running and green (not just disabled)

Tech Notes

  • Jenkins UI can be used to trigger "rebuilds" of the daily jobs at any time for faster iterations
  • given the same browser test suite is run on per-patch CI jobs for WikibaseLExeme extension, it is assumed the issues are (mostly) on the test infrastructure side

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

After the release, I manually restarted the build and all passed except these two: https://integration.wikimedia.org/ci/view/Selenium/job/selenium-daily-beta-WikibaseLexeme/256/console

17:29:24 1) Lexeme:Header can edit the language of a Lexeme:
17:29:24 Promise was rejected with the following reason: timeout
17:29:24 running chrome
17:29:24 Error: Promise was rejected with the following reason: timeout
17:29:24     at LexemePage.setLexemeLanguageItem (tests/selenium/pageobjects/lexeme.page.js:151:11)
17:29:24     at Context.it (tests/selenium/specs/header.edit.js:37:14)
17:29:24     at Promise.F (node_modules/core-js/library/modules/_export.js:36:28)
17:29:24     at elementIdAttribute("0.9027693925843041-4", "disabled") - getAttribute.js:43:55
17:29:24 
17:29:24 2) Lexeme:Header can edit the lexical category of a Lexeme:
17:29:24 Promise was rejected with the following reason: timeout
17:29:24 running chrome
17:29:24 Error: Promise was rejected with the following reason: timeout
17:29:24     at LexemePage.setLexicalCategoryItem (tests/selenium/pageobjects/lexeme.page.js:162:11)
17:29:24     at Context.it (tests/selenium/specs/header.edit.js:69:14)
17:29:24     at Promise.F (node_modules/core-js/library/modules/_export.js:36:28)
17:29:24     at elementIdAttribute("0.16885742401078518-4", "disabled") - getAttribute.js:43:55

The images show that the id that should have been created was not found. I will try to figure out what is going on.

Ladsgroup added a comment.EditedMay 7 2019, 3:44 PM

The images show that the id that should have been created was not found. I will try to figure out what is going on.

It might be the classic issue of elastic not being able to index the newly created item. We had this issue in Wikibase itself a lot.

It's very likely now because I just made a similar edit there: https://wikidata.beta.wmflabs.org/w/index.php?title=Lexeme:L53&diff=1122629&oldid=1083980

It might be the classic issue of elastic not being able to index the newly created item. We had this issue in Wikibase itself a lot.

But then, why does everything else work? Or was it random chance that it hit the two new tests?

It might be the classic issue of elastic not being able to index the newly created item. We had this issue in Wikibase itself a lot.

But then, why does everything else work? Or was it random chance that it hit the two new tests?

Probably the new tests don't have the "wait time" for elastic to pick up. I investigate it

Change 508617 had a related patch set uploaded (by Michael Große; owner: Michael Große):
[mediawiki/extensions/WikibaseLexeme@master] Give elastic on beta more time to recognize created items

https://gerrit.wikimedia.org/r/508617

Change 508617 merged by jenkins-bot:
[mediawiki/extensions/WikibaseLexeme@master] Give elastic on beta more time to recognize created items

https://gerrit.wikimedia.org/r/508617

It still fails in beta cluster because of not being able to find the item.

It still fails in beta cluster because of not being able to find the item.

One of the tests still doesn't find the item, the other one does, but fails to save the change. :/

Restricted Application added a project: User-Michael. · View Herald TranscriptMay 8 2019, 11:42 AM

Change 508834 had a related patch set uploaded (by Michael Große; owner: Michael Große):
[mediawiki/extensions/Wikibase@master] Add nodejs utility to check if cirrus is up-to-date

https://gerrit.wikimedia.org/r/508834

Change 508836 had a related patch set uploaded (by Michael Große; owner: Michael Große):
[mediawiki/extensions/WikibaseLexeme@master] Use util to wait for cirrus in browser test

https://gerrit.wikimedia.org/r/508836

Change 509100 had a related patch set uploaded (by Michael Große; owner: Michael Große):
[mediawiki/extensions/Wikibase@master] Add nodejs utility to check if new entity is known to search

https://gerrit.wikimedia.org/r/509100

Change 509356 had a related patch set uploaded (by Michael Große; owner: Michael Große):
[mediawiki/extensions/WikibaseLexeme@master] Retry inputs in Lexeme header browser tests

https://gerrit.wikimedia.org/r/509356

Change 509356 merged by jenkins-bot:
[mediawiki/extensions/WikibaseLexeme@master] Retry inputs in Lexeme header browser tests

https://gerrit.wikimedia.org/r/509356

Yes, I think so. Will do.

Change 509100 abandoned by Michael Große:
Add nodejs utility to check if new entity is known to search

Reason:
The problem was (hopefully) fix with I7575f2ad34beb0fee1957fab9de6d1ab23dc024b

https://gerrit.wikimedia.org/r/509100

Change 508836 abandoned by Michael Große:
Use util to wait for new item be known in browser test

Reason:
The problem was (hopefully) fixed with I7575f2ad34beb0fee1957fab9de6d1ab23dc024b

https://gerrit.wikimedia.org/r/508836

Change 508834 abandoned by Michael Große:
Add nodejs utility to check if cirrus is up-to-date

Reason:
The problem was (hopefully) fixed with I7575f2ad34beb0fee1957fab9de6d1ab23dc024b

https://gerrit.wikimedia.org/r/508834

Having just checked https://integration.wikimedia.org/ci/view/Selenium/job/selenium-daily-beta-WikibaseLexeme/, tests seem to still be red. Is it the known situation ,or did something break meanwhile?

It is at least still the known situation, see https://integration.wikimedia.org/ci/view/Selenium/job/selenium-daily-beta-WikibaseLexeme/284/artifact/log/shows-Forms-header.png
For the other failures it is hard to tell apart if there was a 500 error or genuine test failure.

There seems to be one message that matches the timestamp:

[XO14d6wQBHcAAH120aYAAABS] /w/index.php?title=Special%3AUserLogin   PHP Fatal Error from line 81 of /srv/mediawiki/wmf-config/InitialiseSettings-labs.php: Cannot redeclare wmfLabsSettings() (previously declared in /srv/mediawiki/wmf-config/InitialiseSettings-labs.php:81)

But there are also a couple of other just seconds around, that might also be responsible.

A shot in a dark but I've noticed quite a few fatals reported on beta wikidata, some of them fitting the time stamps of failing tests

PHP Fatal error: Cannot redeclare wmfLabsSettings() (previously declared in /srv/mediawiki/wmf-config/InitialiseSettings-labs.php:81) in /srv/mediawiki/wmf-config/InitialiseSettings-labs.php on line 81

I will investigate it further.

see T224899 for the above finding. Still not clear whether it is related.

Michael removed Michael as the assignee of this task.Jun 6 2019, 9:45 AM
Michael moved this task from ✅ my work done to 👁️ watching on the User-Michael board.
Addshore moved this task from incoming to in progress on the Wikidata board.Jun 21 2019, 11:28 PM

T230481 might have an effect on this issue, stay tuned

I would suggest disabling tests that regularly fail, until they are fixed. That way the build will become stable and useful immediately. Failing tests can be fixed as there is time.

Some repositories, including mediawiki/core, already do that. Take a look at package.json and page.js for examples. Let me know if you have any questions.

Addshore updated the task description. (Show Details)Sep 29 2020, 9:58 AM
WMDE-leszek updated the task description. (Show Details)Sep 29 2020, 1:37 PM
WMDE-leszek updated the task description. (Show Details)

Change 632464 had a related patch set uploaded (by Rosalie Perside (WMDE); owner: Rosalie Perside (WMDE)):
[mediawiki/extensions/WikibaseLexeme@master] Make existing daily selenium test pass

https://gerrit.wikimedia.org/r/632464

Having looked at this ticket as well as the latest build failures here are a few observations

  • this ticket is more of an umbrella than tracking a specific problem - it has a colorful history of all things "daily selenium tests"
  • failures of the last ~1 month all seem to share the problem of something not being displayed yet (unsuccessfully having waited [implicitly or explicitly], or having failed to wait). Hotspot: form.add.js
  • recent failures (~1 week) are in the vast majority of cases related to waiting for elements on Special:UserLogin. Hotspot: one particular lemma.edit.js preparation hook. As such I understand the general direction of https://gerrit.wikimedia.org/r/632464 but don't think it can be the silver bullet given that behavior is used many times in the test suite but seems to consistently fail in one spot. At the moment I'm suspecting something like a performance drop - maybe after a certain amount of requests, or comparable

I'm skeptical if it will magically solve the lastest issues but we decided to make T255051: Upgrade WebdriverIO in the WikibaseLexeme repository happen and at least give it a chance.

Change 632464 abandoned by Rosalie Perside (WMDE):
[mediawiki/extensions/WikibaseLexeme@master] Make existing daily selenium test pass

Reason:

https://gerrit.wikimedia.org/r/632464

Rosalie_WMDE removed Rosalie_WMDE as the assignee of this task.Oct 12 2020, 7:50 AM
Rosalie_WMDE added a subscriber: Rosalie_WMDE.

The software engineering equivalent of a miracle happened. T255051: Upgrade WebdriverIO in the WikibaseLexeme repository seems to have made this (repeatedly) green again (#789, #790, #791).

Big up to @Michael for being the squeaking wheel about this!

Thanks, but I didn't do much for this except keep having a bad conscience for having created a filter to archive away those emails about failing tests every day.
Now I can change that filter again. Thank you all who worked on it :)

The software engineering equivalent of a miracle happened.

This is one of the best Phabricator comments ever. 😆

WMDE-leszek closed this task as Resolved.Oct 14 2020, 12:23 PM
WMDE-leszek updated the task description. (Show Details)