Page MenuHomePhabricator

Find out why Lexeme:Forms test is flaky in CI and fix
Closed, InvalidPublic

Description

The test Lexeme:Forms FormId counter is not decremented when old revision is restored fails occasionally in CI with the error message: element (".lemma-widget_edit") still not visible after 10000ms

Example failure: https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/Wikibase/+/527549/

"stack trace" from the selenium node job log (https://integration.wikimedia.org/ci/job/quibble-vendor-mysql-php70-docker/26829/console):

11:21:27 1) Lexeme:Forms FormId counter is not decremented when old revision is restored:
11:21:27 element (".lemma-widget_edit") still not visible after 10000ms

11:21:27 Error: element (".lemma-widget_edit") still not visible after 10000ms
11:21:27     at elements(".lemma-widget_edit") - isVisible.js:54:17
11:21:27     at isVisible(".lemma-widget_edit") - waitForVisible.js:73:22

Looking at video recording of the failed test runs it looks like the UI JavaScript is not loaded (UI is not fully "initialized").

Acceptance criteria:

  • The reason of unrelated failure of the test is documented here in the task as a comment
  • The reason of failures has been removed/worked around

Currently on investigation

Event Timeline

Michael triaged this task as High priority.Aug 1 2019, 2:16 PM

Looking at the video recording of the failing test, e.g. https://integration.wikimedia.org/ci/job/quibble-vendor-mysql-php70-docker/26829/artifact/log/Lexeme%253ALemma-can-not-save-lemmas-with-redundant-languages.mp4 it looks like if the JavaScript wasn't loaded. It might be worthwhile to look at browser console log, if there were any (Resource Loader?) errors popping up.
It seems there is no structured way to log client/browser errors to some MW's logging facility, so random brutal way to check log could be to check in the selenium test code if there are errors in the browser reported, and if so, dump them to test logs, i.e. something similar to what Ruby tests do: https://github.com/wikimedia/mediawiki-extensions-Wikibase/blob/master/tests/browser/features/support/env.rb#L46-L58

Change 529042 had a related patch set uploaded (by Pablo Grass (WMDE); owner: Pablo Grass (WMDE)):
[mediawiki/extensions/WikibaseLexeme@master] selenium: log browser logs to node console on failure

https://gerrit.wikimedia.org/r/529042

CI now logs the browser console to node console on failure (same moment in time when taking the screenshots) - not necessarily conclusive, though.
e.g. https://integration.wikimedia.org/ci/job/quibble-vendor-mysql-hhvm-docker/64618/console

Change 529054 had a related patch set uploaded (by Pablo Grass (WMDE); owner: Pablo Grass (WMDE)):
[mediawiki/extensions/WikibaseLexeme@master] selenium: consistently use waitForVisible on element

https://gerrit.wikimedia.org/r/529054

Change 533956 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[mediawiki/extensions/Wikibase@master] Disable CirrusSearch in browser tests

https://gerrit.wikimedia.org/r/533956

Change 533956 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Disable CirrusSearch in browser tests

https://gerrit.wikimedia.org/r/533956

The patch should have fixed the issue hopefully. Feel free to bring it back if flaky tests keep happening

Acceptance criteria listed in the task include "The reason of unrelated failure of the test is documented here in the task as a comment".
I don't think the explanation why/how the change made in https://gerrit.wikimedia.org/r/533956 was related to the issues noticed with browser tests.
Would you mind @Ladsgroup writing down a couple of sentences here for the future generations?

Change 529042 abandoned by Pablo Grass (WMDE):
selenium: log browser logs to node console on failure

https://gerrit.wikimedia.org/r/529042

Change 529054 abandoned by Pablo Grass (WMDE):
selenium: consistently use waitForVisible on element

https://gerrit.wikimedia.org/r/529054

Addshore subscribed.

Seemingly this is still a flakey browser test per monitoring at T277205

image.png (200×1 px, 31 KB)

Per T277205 it doesnt look like this has failed in the past 3-4 weeks