Page MenuHomePhabricator

Loading RC on en.wikipedia with a specific set of filters throws out an exception
Open, LowPublicPRODUCTION ERROR

Description

When loading https://en.wikipedia.org/w/index.php?damaging=verylikelybad&goodfaith=likelybad%3Bverylikelybad&userExpLevel=unregistered%3Bnewcomer%3Blearner&hidemyself=1&hidepreviousrevisions=1&hideWikibase=1&hidelog=1&limit=1000&days=30&damaging__verylikelybad_color=c5&title=Special:RecentChanges&urlversion=2 I get an WMFTimeoutException instead of the normal recent changes page. This doesn't happen on any other page, or recent changes with other filters

[YESs-ymFrvi8w6994LcImgAAAAM] 2021-03-07 10:39:03: Fatal exception of type "WMFTimeoutException"
from /srv/mediawiki/wmf-config/set-time-limit.php(41)
#0 /srv/mediawiki/php-1.36.0-wmf.33/includes/exception/MWExceptionHandler.php(208): {closure}(integer)
#1 [internal function]: MWExceptionHandler::handleError(integer, string, string, integer, array)
#2 /srv/mediawiki/php-1.36.0-wmf.33/includes/libs/rdbms/database/DatabaseMysqli.php(46): mysqli->query(string)
#3 /srv/mediawiki/php-1.36.0-wmf.33/includes/libs/rdbms/database/Database.php(1380): Wikimedia\Rdbms\DatabaseMysqli->doQuery(string)
#4 /srv/mediawiki/php-1.36.0-wmf.33/includes/libs/rdbms/database/Database.php(1298): Wikimedia\Rdbms\Database->executeQueryAttempt(string, string, boolean, string, integer)
#5 /srv/mediawiki/php-1.36.0-wmf.33/includes/libs/rdbms/database/Database.php(1227): Wikimedia\Rdbms\Database->executeQuery(string, string, integer)

Details

Request ID
YESs-ymFrvi8w6994LcImgAAAAM

Event Timeline

Aklapper changed the subtype of this task from "Task" to "Production Error".
Aklapper set Request ID to YESs-ymFrvi8w6994LcImgAAAAM.Mar 7 2021, 11:31 AM
Aklapper edited Stack Trace. (Show Details)
Aklapper set Phatality ID to 36d06658889351388ef55f9d479e2b5996df8f252eeecfdb32d66e7ad9c23e45.

Smaller limits are working, but feeling slow to load

https://en.wikipedia.org/w/index.php?damaging=verylikelybad&goodfaith=likelybad%3Bverylikelybad&userExpLevel=unregistered%3Bnewcomer%3Blearner&hidemyself=1&hidepreviousrevisions=1&hideWikibase=1&hidelog=1&limit=10&days=3&damaging__verylikelybad_color=c5&title=Special:RecentChanges&urlversion=2

Needing the sql which is running here to get a EXPLAIN for index usage

There are also other timeouts for the recentchanges (T239192, T256937), but that could be related to installed extensions or filters and not a duplicate of this one.

Needing the sql which is running here to get a EXPLAIN for index usage

The closest I can find is

SELECT /*! STRAIGHT_JOIN */ rc_id,rc_timestamp,rc_namespace,rc_title,rc_minor,rc_bot,rc_new,rc_cur_id,rc_this_oldid,rc_last_oldid,rc_type,rc_source,rc_patrolled,rc_ip,rc_old_len,rc_new_len,rc_deleted,rc_logid,rc_log_type,rc_log_action,rc_params,rc_actor,r [TRX#328c5e]

Maybe that trx ID can be looked up somewhere?

Found the query in tendril:

SELECT /* SpecialRecentChanges::doMainQuery */ /*! STRAIGHT_JOIN */ rc_id, rc_timestamp, rc_namespace, rc_title, rc_minor, rc_bot, rc_new, rc_cur_id, rc_this_oldid, rc_last_oldid, rc_type, rc_source, rc_patrolled, rc_ip, rc_old_len, rc_new_len, rc_deleted, rc_logid, rc_log_type, rc_log_action, rc_params, rc_actor, recentchanges_actor.actor_user AS `rc_user`, recentchanges_actor.actor_name AS `rc_user_text`, comment_rc_comment.comment_text AS `rc_comment_text`, comment_rc_comment.comment_data AS `rc_comment_data`, comment_rc_comment.comment_id AS `rc_comment_cid`, wl_user, wl_notificationtimestamp, we_expiry, page_latest, (SELECT GROUP_CONCAT(ctd_name SEPARATOR ', ') FROM `change_tag` JOIN `change_tag_def` ON ((ct_tag_id=ctd_id)) WHERE ct_rc_id=rc_id ) AS `ts_tags`, fp_stable, fp_pending_since, ores_damaging_cls.oresc_probability AS `ores_damaging_score`, ores_goodfaith_cls.oresc_probability AS `ores_goodfaith_score` FROM `recentchanges` JOIN `actor` `recentchanges_actor` ON ((actor_id=rc_actor)) JOIN `comment` `comment_rc_comment` ON ((comment_rc_comment.comment_id = rc_comment_id)) LEFT JOIN `user` ON ((actor_user=user_id)) LEFT JOIN `watchlist` ON (wl_user = 20093189 AND (wl_title=rc_title) AND (wl_namespace=rc_namespace)) LEFT JOIN `watchlist_expiry` ON ((wl_id = we_item)) LEFT JOIN `page` ON ((rc_cur_id=page_id)) LEFT JOIN `flaggedpages` ON ((fp_page_id = rc_cur_id)) LEFT JOIN `ores_classification` `ores_damaging_cls` ON (ores_damaging_cls.oresc_model = 59 AND (ores_damaging_cls.oresc_rev=rc_this_oldid) AND ores_damaging_cls.oresc_class = 1) LEFT JOIN `ores_classification` `ores_goodfaith_cls` ON (ores_goodfaith_cls.oresc_model = 60 AND (ores_goodfaith_cls.oresc_rev=rc_this_oldid) AND ores_goodfaith_cls.oresc_class = 1) WHERE (actor_user IS NULL OR (NOT ( (user_editcount >= 500) AND ((user_registration IS NULL) OR (user_registration <= '20210816052016')) ))) AND (actor_name<>'Tgr (WMF)') AND ((rc_this_oldid = page_latest) OR rc_type = 3) AND (rc_type != 3) AND (rc_source != 'wb') AND ((ores_damaging_cls.oresc_probability BETWEEN 0.944 AND 1)) AND (rc_type NOT IN (3, 5)) AND ((ores_goodfaith_cls.oresc_probability BETWEEN 0 AND 0.353)) AND (rc_type NOT IN (3, 5)) AND (rc_timestamp >= '20210816052016') AND rc_new IN (0, 1) ORDER BY rc_timestamp DESC LIMIT 1000 /* f1fb32849e5be6a218f195e9644c63bb db1099:3311 enwiki 67s */

EXPLAIN:

+------+--------------------+---------------------+--------+-----------------------------------------------------------+-----------------------+---------+-----------------------------------------------------------------------+---------+-------------+
| id   | select_type        | table               | type   | possible_keys                                             | key                   | key_len | ref                                                                   | rows    | Extra       |
+------+--------------------+---------------------+--------+-----------------------------------------------------------+-----------------------+---------+-----------------------------------------------------------------------+---------+-------------+
|    1 | PRIMARY            | recentchanges       | range  | rc_timestamp,rc_actor,rc_this_oldid,rc_new_name_timestamp | rc_timestamp          | 14      | NULL                                                                  | 4266373 | Using where |
|    1 | PRIMARY            | recentchanges_actor | eq_ref | PRIMARY,actor_name,actor_user                             | PRIMARY               | 8       | enwiki.recentchanges.rc_actor                                         | 1       | Using where |
|    1 | PRIMARY            | comment_rc_comment  | eq_ref | PRIMARY                                                   | PRIMARY               | 8       | enwiki.recentchanges.rc_comment_id                                    | 1       |             |
|    1 | PRIMARY            | user                | eq_ref | PRIMARY                                                   | PRIMARY               | 4       | enwiki.recentchanges_actor.actor_user                                 | 1       | Using where |
|    1 | PRIMARY            | watchlist           | eq_ref | wl_user,wl_user_notificationtimestamp,wl_namespace_title  | wl_user               | 265     | const,enwiki.recentchanges.rc_namespace,enwiki.recentchanges.rc_title | 1       |             |
|    1 | PRIMARY            | watchlist_expiry    | eq_ref | PRIMARY                                                   | PRIMARY               | 4       | enwiki.watchlist.wl_id                                                | 1       | Using where |
|    1 | PRIMARY            | page                | eq_ref | PRIMARY                                                   | PRIMARY               | 4       | enwiki.recentchanges.rc_cur_id                                        | 1       | Using where |
|    1 | PRIMARY            | flaggedpages        | eq_ref | PRIMARY                                                   | PRIMARY               | 4       | enwiki.recentchanges.rc_cur_id                                        | 1       |             |
|    1 | PRIMARY            | ores_damaging_cls   | eq_ref | oresc_rev_model_class                                     | oresc_rev_model_class | 7       | enwiki.recentchanges.rc_this_oldid,const,const                        | 1       | Using where |
|    1 | PRIMARY            | ores_goodfaith_cls  | eq_ref | oresc_rev_model_class                                     | oresc_rev_model_class | 7       | enwiki.recentchanges.rc_this_oldid,const,const                        | 1       | Using where |
|    2 | DEPENDENT SUBQUERY | change_tag          | ref    | ct_rc_tag_id,ct_tag_id_id                                 | ct_rc_tag_id          | 5       | enwiki.recentchanges.rc_id                                            | 1       | Using index |
|    2 | DEPENDENT SUBQUERY | change_tag_def      | eq_ref | PRIMARY                                                   | PRIMARY               | 4       | enwiki.change_tag.ct_tag_id                                           | 1       |             |
+------+--------------------+---------------------+--------+-----------------------------------------------------------+-----------------------+---------+-----------------------------------------------------------------------+---------+-------------+

which doesn't seem bad.

There are 60 RecentChanges timeouts in the last 90 days (which might or might not be all this bug) so whatever the specific combination of filters triggering this, it seems to not be used much. (Although that could just be because people found it's not working.)

The Growth team is providing putting-out-things-on-fire level maintenence for RecentChanges, but given the low impact and high complexity I don't think we have the capacity to look into this further.

kostajh subscribed.

There are 60 RecentChanges timeouts in the last 90 days

Moving to low priority accordingly.

As per T297708, recent changes is now limited to 30s instead of 60s.

There are 60 RecentChanges timeouts in the last 90 days.

Note that this timeout appears quite consistently reproducible. It isn't random or intermittent due to factors unrelated to the query.

In addition to that, the query is not from a gadget or custom API query using unreasonable or esoteric parameters.

It is reproducible from the default Recent changes view on English Wikipedia, by simply toggling checkboxes from the filters promoted in the RCFilters interface.

I'm mentioning that to say that the frequency might not be as indiciative as of other production errors from highly visited articles, since this happens every time someone is trying to use it. The frequency might then be indicate of how often change patrol communities are bothering to check in to see if it is still broken, on the assumption that the patrollers remember that their usual confguration is unable to load.

Given the filters in question, I suspect this might make it impossible for some communities to perform change patrol at all as without the filters patrolling would be significantly more difficult or noisy.

https://en.wikipedia.org/wiki/Special:RecentChanges

  1. Quality prediction: Very likely problems.
  2. Intent prediction: Likely bad.
  3. User exp: Unreg, Newcomer, Learner.
  4. Author: Others.
  5. Revision: Latest.

Times out 4 of 5 attempts, near ~30s. And that's still with the default of only 50 results.

https://en.wikipedia.org/wiki/Special:RecentChanges?damaging=verylikelybad&goodfaith=likelybad&userExpLevel=unregistered%3Bnewcomer%3Blearner&hidemyself=1&hidebots=1&hidepreviousrevisions=1&hidecategorization=1&hideWikibase=1&limit=50&days=7&urlversion=2

I suspect the above steps still succeeded upto last month, given that these take around 30s to load which used to be within the limit.

In order to match the link in the task description, which presumably took above 60s, we do one more click. E.g. extend length from 50 to 500 items (which is common among patrollers in my experience, since 50 isn't very much considering the patroller further filters by eye based on their own hueristics and interest. Also you have to consider whether someone will bother patrolling 50 changes if it takes a minute to load each time), or add "Type: Category changes" (as the link in the task description did).

With this, it seems to time out every time, same as the task description basically.

https://en.wikipedia.org/wiki/Special:RecentChanges?damaging=verylikelybad&userExpLevel=unregistered%3Bnewcomer%3Blearner&hidemyself=1&hidebots=1&hidepreviousrevisions=1&hideWikibase=1&hidelog=1&limit=500&days=7&goodfaith=likelybad&urlversion=2

I'm mentioning that to say that the frequency might not be as indiciative as of other production errors from highly visited articles, since this happens every time someone is trying to use it. The frequency might then be indicate of how often change patrol communities are bothering to check in to see if it is still broken, on the assumption that the patrollers remember that their usual confguration is unable to load.

That's a fair point.

Minimal reproduction is using an ORES "very bad filter" + latest revision. I'm guessing the ORES filter only applies to a tiny fraction of the edits, and the latest revision filter (possibly various others too) prevents use of the ORES score index, so the query needs to crawl a large amount of rows. I don't think there's an easy fix for that - ultimately, an RDBMS is just the poor match for the kind of interface Special:RecentChanges wants to be.

Unrelatedly, the Javascript code seems to be doing something very wrong while it's waiting for the result of the query - my browser is slowed down to a crawl, even though there is no reason it should be doing anything CPU-intensive.