Page MenuHomePhabricator

Run Wikibase daily browser tests on Jenkins
Closed, ResolvedPublic

Description

TODO

๐Ÿƒโ€โ™‚๏ธ selenium-Wikibase-chrome/MEDIAWIKI_ENVIRONMENT=beta run takes 35-65 minutes. 20-57% of tests are failing (42-121 from 212). ๐Ÿ’ฃ Failures should be fixed.

๐Ÿƒโ€โ™‚๏ธ selenium-Wikibase-chrome/MEDIAWIKI_ENVIRONMENT=test needs 9.5 hours to run ๐ŸŒ with just a few failures (4-16 out of 200 tests). If the duration of the job is acceptable, failures need to be fixed. Also, Jenkins job template should be updated so other jobs can specify a shorter timeout, since only this job needs such long timeout. If the duration is not acceptable, tests should be made faster.

Details

Now Wikibase daily browser tests run using SauceLabs. And they randomly fail for reasons not related to the code but related to SauceLabs API. See T152963: Increase in failures caused by Saucelabs for details.

As soon as randomly failing tests are worse than no tests at all, we (Wikidata team) would like to have another daily job running those tests but on Jenkins (without using SauceLabs). Hopefully they will be stable.

PS: If they are stable we will probably like to kill SauceLabs job, but not right now.

Important changes to current job:

  • Removed SAUCE_ONDEMAND_ACCESS_KEY.
  • Instead of running on BrowserTests slaves, the job now runs on DebianJessie && contintLabsSlave.

The change in the shell script to take screenshots and record videos:

export SKIP_TMPFS=1
export HEADLESS=true
export HEADLESS_DISPLAY=$((70 + EXECUTOR_NUMBER % 20))
export HEADLESS_DESTROY_AT_EXIT=true
export HEADLESS_CAPTURE_PATH="$WORKSPACE/log"

HEADLESS=true SCREENSHOT_FAILURES=true SCREENSHOT_FAILURES_PATH="$WORKSPACE/log" bundle exec rake selenium

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

So far I was focused on getting the job running for beta. I did not even look at the test job.

For beta, the original job (selenium-Wikibase) has 1 failure. The new job (selenium-Wikibase-T167432) has no failures.

For test, the original job has 63, failures, the new one 95.

I did not even look at failures yet, but it looks to me that there is something wrong with test job, both for original and the new jobs.

I took a look at a few selenium-Wikibase/test failures (example: jenkins, sauce labs) and all of them failed with error No match was found.

0012screenshot.png (768ร—1 px, 117 KB)

I took a look at selenium-Wikibase-T167432/test failures:

  • 1 RSpec::Expectations::ExpectationNotMetError
  • 26 The save has failed. (failed-save) (MediawikiApi::ApiError)
  • 56 Watir::Wait::TimeoutError

It looks to me that Jenkins' IPs have to be whitelisted for testwiki, like it was done for beta cluster (386343). That should fix The save has failed. (failed-save) (MediawikiApi::ApiError).

Thanks @zeljkofilipin! I will try to look at those failures in more detail when I am on a stable network connection.

@Ladsgroup: Regarding whitelisting IPs of Jenkins, is this something you might be able to look into, or should I ask further?

Change 390394 had a related patch set uploaded (by Zfilipin; owner: Zfilipin):
[integration/config@master] WIP Wikibase daily browser test does not use Sauce Labs

https://gerrit.wikimedia.org/r/390394

$ diff output0/selenium-Wikibase output3/selenium-Wikibase-chrome 
13c13
<         <string>BrowserTests</string>
---
>         <string>DebianJessie &amp;&amp; contintLabsSlave</string>
42c42
<   <assignedNode>BrowserTests</assignedNode>
---
>   <assignedNode>DebianJessie &amp;&amp; contintLabsSlave</assignedNode>
145a146,156
> # screenshots
> export SCREENSHOT_FAILURES=true
> export SCREENSHOT_FAILURES_PATH=&quot;$WORKSPACE/log&quot;
> 
> # videos
> export SKIP_TMPFS=1
> export HEADLESS=true
> export HEADLESS_DISPLAY=$((70 + EXECUTOR_NUMBER % 20))
> export HEADLESS_DESTROY_AT_EXIT=true
> export HEADLESS_CAPTURE_PATH=&quot;$WORKSPACE/log&quot;
> 
281,284d291
<         <org.jenkinsci.plugins.credentialsbinding.impl.StringBinding>
<           <variable>SAUCE_ONDEMAND_ACCESS_KEY</variable>
<           <credentialsId>sauce-ondemand-access-key</credentialsId>
<         </org.jenkinsci.plugins.credentialsbinding.impl.StringBinding>

Thanks @zeljkofilipin! This is really looking good. We'll have a Jenkins IPs whitelisted and see what actually failures are actually there on test. This is most likely going to happen on Monday.

Change 384500 abandoned by Zfilipin:
WIP Run Wikibase daily browser tests on Jenkins

Reason:
just an experiment

https://gerrit.wikimedia.org/r/384500

Change 390975 had a related patch set uploaded (by Ladsgroup; owner: Amir Sarabadani):
[operations/mediawiki-config@master] Whitelist jenkins in test wiki

https://gerrit.wikimedia.org/r/390975

Change 390975 merged by jenkins-bot:
[operations/mediawiki-config@master] Whitelist jenkins in test wiki

https://gerrit.wikimedia.org/r/390975

Mentioned in SAL (#wikimedia-operations) [2017-11-13T14:10:24Z] <ladsgroup@tin> Synchronized wmf-config/InitialiseSettings.php: Whitelist jenkins in test wiki (T167432) (duration: 00m 47s)

hmm, quite a number of odd failures, e.g.

unknown error: Chrome failed to start: exited abnormally

browser window was closed (Watir::Exception::NoMatchingWindowFoundException)

Net::ReadTimeout (Net::ReadTimeout)

Should I assume this was some hiccup and re-run the job?

I do not know what went wrong ๐Ÿ˜• I am re-running the job.

Testwiki job now looks much better.

Screen Shot 2017-11-14 at 13.08.15.png (221ร—513 px, 16 KB)

73 (out of 74) failures are Watir::Wait::TimeoutError

After a quick look at the screenshots, looks like the problem is in No match was found error message.

Adding references to statements_ Add reference with multiple snaks.png (899ร—1 px, 71 KB)

Let me know if there is anything left for me to do here.

Change 390394 merged by jenkins-bot:
[integration/config@master] Wikibase daily browser test does not use Sauce Labs

https://gerrit.wikimedia.org/r/390394

I've tried to look into why there is so many test failures when targeting testwiki. It seems many (I didn't check all failures to be sure to say this is the only reason of failures) are due to Wikibase search API saying "No match was found" when searching for property or item. This is a bit odd as

  1. Same tests seem to generally work fine when targeting beta
  2. Things like https://integration.wikimedia.org/ci/job/selenium-Wikibase-chrome/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=test,PLATFORM=Linux,label=DebianJessie%20&&%20contintLabsSlave/lastFailedBuild/artifact/log/Adding%20references%20to%20statements%3A%20Add%20reference%20with%20one%20snak%3A%20%7C%20click%20the%20statement%20save%20button%20%7C.png where the same property has been found seconds before (for statement level), and then API says "No match found" when looking for the same thing in the reference part.

My wild and not-based-on-anything guess there might be something with testwiki's config. If I am not mistaken, unlike beta, that wiki uses ElasticSearch via CirrusSearch. Maybe there is some kind of search rate limiting on CirrusSearch or Elastic Search side, which makes it return no results when there are too many requests? Those browser tests seem to run pretty fast, so I wouldn't be surprised if there were too many queries.
I don't know CirrusSearch neither Elastic, so cannot really tell. Who could chime in here and enlighten me? Maybe @Smalyshev has any knowledge if there could be any rate limiting or something similar causing no search matches being returned where we know there must be some?

Change 399622 had a related patch set uploaded (by WMDE-leszek; owner: WMDE-leszek):
[mediawiki/extensions/Wikibase@master] Try out if waiting a bit after creating property make search results appear as expected

https://gerrit.wikimedia.org/r/399622

Change 399622 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Try out if waiting a bit after creating property make search results appear as expected

https://gerrit.wikimedia.org/r/399622

Change 409918 had a related patch set uploaded (by WMDE-leszek; owner: WMDE-leszek):
[mediawiki/extensions/Wikibase@master] Wait for Cirrus index update in browser tests when needed

https://gerrit.wikimedia.org/r/409918

Change 409918 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Wait for Cirrus index update in browser tests when needed

https://gerrit.wikimedia.org/r/409918

@zeljkofilipin: as discussed today, if you could rise the timeout limit for the job targeting test to say 240 minutes, it would hopefully be enough for the job to finish. We could then see if tests are passing after the Cirrus-related adjustments I've been making weeks ago.
Meanwhile I am going to investigate why beta-job is now so badly red as it has used to be green.

Change 434037 had a related patch set uploaded (by WMDE-leszek; owner: WMDE-leszek):
[integration/config@master] Temporarily rise up the tiemout of selenium jobs

https://gerrit.wikimedia.org/r/434037

Change 434037 merged by jenkins-bot:
[integration/config@master] Temporarily increase the timeout of selenium jobs

https://gerrit.wikimedia.org/r/434037

@WMDE-leszek Looks like even 4 hours ๐Ÿ•ฐ was not enough. Please create a patch that increases the timeout to 5 hours. โณ

Change 434908 had a related patch set uploaded (by Zfilipin; owner: Zfilipin):
[integration/config@master] Increase timeout of daily Selenium jobs

https://gerrit.wikimedia.org/r/434908

Change 434908 merged by jenkins-bot:
[integration/config@master] Increase timeout of daily Selenium jobs

https://gerrit.wikimedia.org/r/434908

434908 is merged and deployed, I have started the job. We will know more in 5 hours. ๐Ÿ˜…

I am not sure what happened, selenium-Wikibase-chrome/test now takes 20-30 minutes instead of 4-5 hours. ๐Ÿค”

selenium-Wikibase-chrome/MEDIAWIKI_ENVIRONMENT=test is timing out even after 5 hours. I will increase the timeout to 10 hours just to see how long it takes. Once it passes, I will reduce it to a reasonable number of hours. ๐Ÿ˜…

Change 437486 had a related patch set uploaded (by Zfilipin; owner: Zfilipin):
[integration/config@master] WIP Increase timeout of daily Selenium jobs

https://gerrit.wikimedia.org/r/437486

zeljkofilipin updated the task description. (Show Details)
zeljkofilipin updated the task description. (Show Details)
zeljkofilipin updated the task description. (Show Details)

Nothing left for me to do here, so un-assigning. There are a couple of things left to do, I have documented them in TODO section of task description.

Change 437486 abandoned by Zfilipin:
WIP Increase timeout of daily Selenium jobs

Reason:
selenium-Wikibase-chrome fails even with 5 hour timeout https://integration.wikimedia.org/ci/job/selenium-Wikibase-chrome/buildTimeTrend

https://gerrit.wikimedia.org/r/437486

Change 482640 had a related patch set uploaded (by Hashar; owner: Hashar):
[operations/mediawiki-config@master] Add new WMCS IP range to $wgRateLimitsExcludedIps

https://gerrit.wikimedia.org/r/482640

Change 482640 had a related patch set uploaded (by Zfilipin; owner: Hashar):
[operations/mediawiki-config@master] Add new WMCS IP range to $wgRateLimitsExcludedIps

https://gerrit.wikimedia.org/r/482640

Change 482640 merged by jenkins-bot:
[operations/mediawiki-config@master] Add new WMCS IP range to $wgRateLimitsExcludedIps

https://gerrit.wikimedia.org/r/482640

Mentioned in SAL (#wikimedia-operations) [2019-04-02T23:30:26Z] <jforrester@deploy1001> Synchronized wmf-config/InitialiseSettings.php: SWAT Add new WMCS IP range to wgRateLimitsExcludedIps T167432 (duration: 00m 57s)

For the TODOS...

Beta is currently green https://integration.wikimedia.org/ci/view/Selenium/job/selenium-daily-beta-Wikibase/

I can't find the jobs for test right now? Are these even running any more? Or are we just on beta now?

If so this can probably be closed!

I can't find the jobs for test right now? Are these even running any more? Or are we just on beta now?

I'm not sure what you mean. This?

Random patch: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Wikibase/+/521751

jenkins-bot
Patch Set 1: Verified-1
Main test build failed.
...
quibble-vendor-mysql-hhvm-docker https://integration.wikimedia.org/ci/job/quibble-vendor-mysql-hhvm-docker/57703/console : SUCCESS in 20m 19s

From https://integration.wikimedia.org/ci/job/quibble-vendor-mysql-hhvm-docker/57703/consoleFull

------------------------------------------------------------------
[chrome #0-28] Session ID: 84dd0fde5aabdbb3cac108c45ec043ee
[chrome #0-28] Spec: /workspace/src/extensions/Wikibase/repo/tests/selenium/specs/blocked.js
[chrome #0-28] Running: chrome
[chrome #0-28]
[chrome #0-28] blocked user cannot use
[chrome #0-28]   โœ“ Special:SetLabel
[chrome #0-28]   โœ“ Special:SetDescription
[chrome #0-28]   โœ“ Special:SetAliases
[chrome #0-28]   โœ“ Special:SetLabelDescriptionAliases
[chrome #0-28]   โœ“ Special:SetSiteLink
[chrome #0-28]   โœ“ Special:NewItem
[chrome #0-28]   โœ“ Special:NewProperty
[chrome #0-28]   โœ“ Special:MergeItems
[chrome #0-28]   โœ“ Special:RedirectEntity
[chrome #0-28]
[chrome #0-28]
[chrome #0-28] 9 passing (12s)
[chrome #0-28]


	Video location: /workspace/log/item-can-add-a-statement-using-the-keyboard.mp4 

	ffmpeg exited with code 255 /workspace/log/item-can-add-a-statement-using-the-keyboard.mp4
------------------------------------------------------------------
[chrome #0-29] Session ID: b3c0e861d94c8a0adcfb0fb93f965220
[chrome #0-29] Spec: /workspace/src/extensions/Wikibase/repo/tests/selenium/specs/item.js
[chrome #0-29] Running: chrome
[chrome #0-29]
[chrome #0-29] item
[chrome #0-29]   โœ“ can add a statement using the keyboard
[chrome #0-29]   โœ“ old revisions do not have an edit link
[chrome #0-29]
[chrome #0-29]
[chrome #0-29] 2 passing (19s)
[chrome #0-29]


	Video location: /workspace/log/WikibaseRepoNonExistingItemPage-edit-tab-does-should-not-be-there.mp4 

	ffmpeg exited with code 255 /workspace/log/WikibaseRepoNonExistingItemPage-edit-tab-does-should-not-be-there.mp4
------------------------------------------------------------------
[chrome #0-30] Session ID: 728232a2a25d8b0019b1431c7af05971
[chrome #0-30] Spec: /workspace/src/extensions/Wikibase/repo/tests/selenium/specs/nonexisting.item.js
[chrome #0-30] Running: chrome
[chrome #0-30]
[chrome #0-30] WikibaseRepoNonExistingItemPage
[chrome #0-30]   โœ“ edit tab does should not be there
[chrome #0-30]   โœ“ the title should match
[chrome #0-30]
[chrome #0-30]
[chrome #0-30] 2 passing (3s)
[chrome #0-30]

[22:29:20] [S] [MWBOT] Login successful: WikiAdmin@http://127.0.0.1:9412/
------------------------------------------------------------------
[chrome #0-31] Session ID: 0fd3e92276a9742e865292983f7bacfa
[chrome #0-31] Spec: /workspace/src/extensions/Wikibase/repo/tests/selenium/specs/readmode.references.js
[chrome #0-31] Running: chrome
[chrome #0-31]
[chrome #0-31] WikibaseReferenceOnProtectedPage
[chrome #0-31]   โœ“ can expand collapsed references on a protected page as unprivileged user
[chrome #0-31]
[chrome #0-31]
[chrome #0-31] 1 passing (11s)
[chrome #0-31]



==================================================================

Yes, but this ticket is talking about daily browser tests, not on patch browser tests.

The link in the description in the TODO section for test is https://integration.wikimedia.org/ci/job/selenium-Wikibase-chrome/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=test,PLATFORM=Linux,label=DebianJessie%20&&%20contintLabsSlave/ but that goes nowhere.

I'm not sure if these run any more? At least I can't find them. If we don't run these any more then we can probably close this ticket, as the daily beta tests are green.

I see.

Selenium view lists all jobs that have selenium in name (actually ^(?=.*selenium)((?!noselenium).)*$). The only Wikibase related job is selenium-daily-beta-Wikibase and it's green. I guess you can resolved this task then. ๐Ÿ˜

Addshore claimed this task.

The selenium-wikibase-chrome job was based on ruby mediawiki_selenium. This task was to run that test suite on patch submission but it never has been done since the suite was so long. The job eventually got deleted a month ago:

commit d26468ed09f87508bffa1762ed3fad529edf2ff9

Author: Amir Sarabadani 
Date:   Wed Jun 5 16:47:49 2019 +0200

Drop daily ruby browser tests for wikibase and wikibase lexeme

They all are moved to nodejs

Bug: T224301
Change-Id: I47922278d49b3387de00a2c8f540aa4087539ad4

Note that the change quoted above was premature and it is not true that all Wikibase browser tests have been migrated to nodejs. We still have quite some work there.
We intend to restore said daily job soon, but let's track this separately, as the status of this task has been clearly confusing.