Page MenuHomePhabricator

Create a browser test to make sure that the related articles are shown on the beta cluster
Closed, InvalidPublic

Description

T224879 caused the extension to stop calling the morelike api endpoints. It's currently served by CirrusSearch and Elasticsearch but these services cannot survive the traffic of our large wikipedias without our various cache layers (varnish + MW ObectCache).

So when the extension is broken for too long the caches are being invalidated and when the extension is fixed it must be re-activated gradually to avoid causing an incident.

This ticket is about creating a browser test to mitigate the issue by detecting any defects earlier.

Related incident report: https://wikitech.wikimedia.org/wiki/Incident_documentation/20190606-CirrusSearch-MoreLike

Event Timeline

dcausse renamed this task from Create a browser tests to make sure the related articles are shown on the beta cluster to Create a browser test to make sure that the related articles are shown on the beta cluster.Jun 6 2019, 3:30 PM
dcausse added a project: Wikimedia-Incident.
ovasileva triaged this task as Medium priority.Jun 11 2019, 1:41 PM

@Jdlrobson thanks, I should have looked more closely at jenkins before creating this task. Could someone verify that it actually failed between May 28 and June 2 (jenkins history is too short but hopefully a mail was sent to someone)?

Found https://lists.wikimedia.org/pipermail/qa-alerts/2019-May/date.html#start and https://lists.wikimedia.org/pipermail/qa-alerts/2019-June/date.html#start and indeed the browser test properly failed 6 times consecutively.
I'll start monitoring this particular job but Is there something we can do so that we take actions when it happens again in the future?

Change 520460 had a related patch set uploaded (by DCausse; owner: DCausse):
[integration/config@master] Add discovery alerts ML to some beta selenium jobs

https://gerrit.wikimedia.org/r/520460

@Jdlrobson thanks, I should have looked more closely at jenkins before creating this task. Could someone verify that it actually failed between May 28 and June 2 (jenkins history is too short but hopefully a mail was sent to someone)?

The beta cluster job runs daily, so I guess if anything is being touched that might impact the morelike API, it's best to merge that after a branch cut and verify those tests are continuing to work correctly. Our team should be monitoring those too (https://www.mediawiki.org/w/index.php?title=Reading/Web/Chores) although it looks like we missed on this occasion possibly due to some noise on our failing Minerva build (cc @phuedx), You can test the morelike API on https://en.m.wikipedia.beta.wmflabs.org/wiki/Baby_chicks as well.

Change 520460 merged by jenkins-bot:
[integration/config@master] Add discovery alerts ML to some beta selenium jobs

https://gerrit.wikimedia.org/r/520460

Thanks, we will try to be more cautious on this feature in the future, if there is anything else we can collectively do to improve this I am happy to help otherwise please feel free to close this task as Invalid.