occasional failures in TestUserContribs
Open, Needs TriagePublic

jayvdb created this task.Dec 16 2014, 2:44 PM
jayvdb updated the task description. (Show Details)
jayvdb raised the priority of this task from to Needs Triage.
jayvdb added a project: Pywikibot-tests.
jayvdb changed Security from none to None.
jayvdb moved this task from Backlog to Test failures on the Pywikibot-tests board.
jayvdb added a subscriber: jayvdb.

Becoming more frequent ... another one.

https://travis-ci.org/wikimedia/pywikibot-core/jobs/44356093

An Easy task is to split this test method into multiple test methods , one for each distinct function tested. That will hopefully identify which part of this test method is failing, and which API parameter is problematic.

The 30 timeout is having an effect on this; all en.wp builds are failing on this test: https://travis-ci.org/jayvdb/pywikibot-core/builds/67715637

Change 219614 had a related patch set uploaded (by John Vandenberg):
Split testUsercontribs

https://gerrit.wikimedia.org/r/219614

Change 219614 merged by jenkins-bot:
Split testUsercontribs

https://gerrit.wikimedia.org/r/219614

jayvdb renamed this task from occasional failure in testUsercontribs to occasional failure in TestUserContribs test_user_prefix_reverse.Jun 21 2015, 10:46 PM

Three builds on https://travis-ci.org/wikimedia/pywikibot-core/builds/67768772 are all failing on test_user_prefix_reverse

And these ones are all failing on test_invalid_range

https://travis-ci.org/jayvdb/pywikibot-core/builds/68406346

jayvdb renamed this task from occasional failure in TestUserContribs test_user_prefix_reverse to occasional failures in TestUserContribs.Jun 26 2015, 12:42 AM

test_invalid_range and test_user_prefix_reverse failures: https://travis-ci.org/wikimedia/pywikibot-core/builds/71009193 (enwp: py27 both; 3.3 and 3.4 only failed test_invalid_range; ignore unrelated zh.wikisource and wikidata failures)

jayvdb removed a project: Easy.
jayvdb updated the task description. (Show Details)Aug 23 2015, 10:51 PM
MaxSem triaged this task as Unbreak Now! priority.Aug 23 2015, 10:52 PM
MaxSem added a subscriber: MaxSem.

This is causing über slow requests in production right now that time out after 60 seconds:

EXPLAIN SELECT   rev_id,rev_timestamp,page_namespace,page_title,rev_user,rev_user_text,rev_deleted,rev_page,page_latest,rev_comment,rev_minor_edit,rev_parent_id  FROM `page`,`revision` FORCE INDEX (usertext_timestamp)   WHERE (page_id=rev_page) AND ((rev_deleted & 4) != 4) AND (rev_user_text LIKE 'Tim%' ) AND (rev_timestamp<='20081010115959') AND (rev_timestamp>='20081010000001')  ORDER BY rev_user_text DESC,rev_timestamp DESC,rev_id DESC LIMIT 6;
+------+-------------+----------+--------+--------------------+--------------------+---------+--------------------------+---------+-------------+
| id   | select_type | table    | type   | possible_keys      | key                | key_len | ref                      | rows    | Extra       |
+------+-------------+----------+--------+--------------------+--------------------+---------+--------------------------+---------+-------------+
|    1 | SIMPLE      | revision | range  | usertext_timestamp | usertext_timestamp | 273     | NULL                     | 2254294 | Using where |
|    1 | SIMPLE      | page     | eq_ref | PRIMARY            | PRIMARY            | 4       | enwiki.revision.rev_page |       1 |             |
+------+-------------+----------+--------+--------------------+--------------------+---------+--------------------------+---------+-------------+
2 rows in set (0.04 sec)

Please fix that or we will have to disable the API functionality used in miser mode, effectively making it gone forever. And still breaking all your tests ;)

Example fixes could include querying a labs instance or at least testwiki, with request tailored such that it doesn't cause outrageous table scans.

jayvdb added a comment.EditedAug 23 2015, 11:02 PM

We run these test methods on all sites in our matrix, which includes beta wiki (en & zh) and test.wp and test.wikidata, and production wikis en.wp, zh.ws, ar.wikt, he.wikivoyage, wikidata (and some non WMF wikis).

The problem only occurs on en.wp.

Here is it running on test.wikipedia and zh.betawp.
https://travis-ci.org/wikimedia/pywikibot-core/jobs/76899714#L1080
https://travis-ci.org/wikimedia/pywikibot-core/jobs/76899710#L844

Our en.wp betawiki tests are having a login problem, so I cant confirm easily if these tests have worked correctly recently (or at all), but I will dig into that now.

Change 233334 had a related patch set uploaded (by John Vandenberg):
Split TestUserContribs between user and non-user

https://gerrit.wikimedia.org/r/233334

Confirmed that these tests are not being run on beta en.wp , as these tests have been tagged as 'user' tests, and our normal test user isnt set up on the beta cluster (T100797: Set up Pywikibot account on beta sites to run user tests).
However our oauth test user is active on the beta cluster, which is why these tests are running on zh.wp.

https://gerrit.wikimedia.org/r/233334 will enable these tests on beta en.wp.
Another approach is to use oauth on beta en.wp , and possibly not use oauth on beta zh.wp.

Change 233341 had a related patch set uploaded (by John Vandenberg):
Use oauth on beta en.wp and add beta es.ws

https://gerrit.wikimedia.org/r/233341

Change 233334 merged by jenkins-bot:
Split TestUserContribs between user and non-user

https://gerrit.wikimedia.org/r/233334

All patches have been merged; what's the status of this task?
Can this be closed as resolved, or is more work (what?) needed?

jayvdb lowered the priority of this task from Unbreak Now! to Needs Triage.Aug 31 2015, 11:47 PM

The merged pywikibot change did the following:

  • allowed testing this bug against the beta cluster, specifically against en.wp on the beta cluster. We havent seen the bug reproduced on the beta cluster yet.
  • halved the number of times that Pywikibot runs this test

The bug in MediaWiki still exists on the production wikis, but it only appears in English Wikipedia, likely due to its very large user list.

I hope that by halving the number of times that Pywikibot runs this test, the priority is no longer "Unbreak Now!". If that doesnt suffice, we can reduce the load spike on English Wikipedia further by setting pywikibot.config.max_retries = 0 for this one test, and only one en.wp. We could also reduce the response timeout for this test, so that it fails sooner, and the API can kill the MySQL query (if it is smart enough to do this).

I havent seen this problem for a while. Yesterday all of the en.wp jobs of one build was filled with both errors in test_user_prefix_range and test_user_prefix_reverse.

Change 259607 had a related patch set uploaded (by John Vandenberg):
Split TestUserContribs between user and non-user

https://gerrit.wikimedia.org/r/259607

Restricted Application added a subscriber: StudiesWorld. · View Herald TranscriptDec 16 2015, 10:51 PM

Change 259607 merged by jenkins-bot:
Split TestUserContribs between user and non-user

https://gerrit.wikimedia.org/r/259607

Krinkle moved this task from Backlog to Meta on the MediaWiki-Database board.May 8 2017, 1:32 AM