Page MenuHomePhabricator

WMFTimeoutException in Wikidata recentchanges
Closed, DeclinedPublicPRODUCTION ERROR

Description

Problem:

If you combine many filters the page will not load.

Note:

Also true for Wikipedia.

Solution:

Set max limit to 100?

Example:

https://www.wikidata.org/wiki/Special:RecentChanges?hidebots=1&reviewStatus=unpatrolled&hidepreviousrevisions=1&hidecategorization=1&namespace=12%3B120%3B146%3B640%3B828&limit=100&days=30&enhanced=1&urlversion=2
(not working)

Original:

Error message
[XdzZwApAICMAAJF8grcAAAAJ] 2019-11-26 07:53:33: Fatal exception of type "WMFTimeoutException"
trace
#0 /srv/mediawiki/php-1.35.0-wmf.5/includes/exception/MWExceptionHandler.php(200): {closure}(integer)
#1 [internal function]: MWExceptionHandler::handleError(integer, string, string, integer, array)
#2 /srv/mediawiki/php-1.35.0-wmf.5/includes/libs/rdbms/database/DatabaseMysqli.php(46): mysqli->query(string)
#3 /srv/mediawiki/php-1.35.0-wmf.5/includes/libs/rdbms/database/Database.php(1308): Wikimedia\Rdbms\DatabaseMysqli->doQuery(string)
#4 /srv/mediawiki/php-1.35.0-wmf.5/includes/libs/rdbms/database/Database.php(1226): Wikimedia\Rdbms\Database->executeQueryAttempt(string, string, boolean, string, integer)
#5 /srv/mediawiki/php-1.35.0-wmf.5/includes/libs/rdbms/database/Database.php(1162): Wikimedia\Rdbms\Database->executeQuery(string, string, integer)
#6 /srv/mediawiki/php-1.35.0-wmf.5/includes/libs/rdbms/database/Database.php(1828): Wikimedia\Rdbms\Database->query(string, string)
#7 /srv/mediawiki/php-1.35.0-wmf.5/includes/libs/rdbms/database/DBConnRef.php(68): Wikimedia\Rdbms\Database->select(array, array, array, string, array, array)
#8 /srv/mediawiki/php-1.35.0-wmf.5/includes/libs/rdbms/database/DBConnRef.php(318): Wikimedia\Rdbms\DBConnRef->__call(string, array)
#9 /srv/mediawiki/php-1.35.0-wmf.5/includes/specials/SpecialRecentChanges.php(349): Wikimedia\Rdbms\DBConnRef->select(array, array, array, string, array, array)
#10 /srv/mediawiki/php-1.35.0-wmf.5/includes/specialpage/ChangesListSpecialPage.php(1024): SpecialRecentChanges->doMainQuery(array, array, array, array, array, FormOptions)
#11 /srv/mediawiki/php-1.35.0-wmf.5/includes/specialpage/ChangesListSpecialPage.php(629): ChangesListSpecialPage->getRows()
#12 /srv/mediawiki/php-1.35.0-wmf.5/includes/specials/SpecialRecentChanges.php(165): ChangesListSpecialPage->execute(NULL)
#13 /srv/mediawiki/php-1.35.0-wmf.5/includes/specialpage/SpecialPage.php(575): SpecialRecentChanges->execute(NULL)
#14 /srv/mediawiki/php-1.35.0-wmf.5/includes/specialpage/SpecialPageFactory.php(607): SpecialPage->run(NULL)
#15 /srv/mediawiki/php-1.35.0-wmf.5/includes/MediaWiki.php(298): MediaWiki\Special\SpecialPageFactory->executePath(Title, RequestContext)
#16 /srv/mediawiki/php-1.35.0-wmf.5/includes/MediaWiki.php(967): MediaWiki->performRequest()
#17 /srv/mediawiki/php-1.35.0-wmf.5/includes/MediaWiki.php(530): MediaWiki->main()
#18 /srv/mediawiki/php-1.35.0-wmf.5/index.php(46): MediaWiki->run()
#19 /srv/mediawiki/w/index.php(3): require(string)
#20 {main}
Impact

Unable to see the recent changes with the filters applied

Details

Request ID
XdzZwApAICMAAJF8grcAAAAJ
Request URL
https://www.wikidata.org/wiki/Special:RecentChanges?hidebots=1&reviewStatus=unpatrolled&hidepreviousrevisions=1&hidecategorization=1&namespace=12%3B120%3B146%3B640%3B828&limit=500&days=30&enhanced=1&urlversion=2

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

The query works with a smaller limit on result and days
https://www.wikidata.org/wiki/Special:RecentChanges?hidebots=1&reviewStatus=unpatrolled&hidepreviousrevisions=1&hidecategorization=1&namespace=12%3B120%3B146%3B640%3B828&limit=1&days=1&enhanced=1&urlversion=2

It works also for a single namespace
https://www.wikidata.org/wiki/Special:RecentChanges?hidebots=1&reviewStatus=unpatrolled&hidepreviousrevisions=1&hidecategorization=1&namespace=12&limit=500&days=30&enhanced=1&urlversion=2

SQL should look like

SELECT rc_id, 
       rc_timestamp, 
       rc_namespace, 
       rc_title, 
       rc_minor, 
       rc_bot, 
       rc_new, 
       rc_cur_id, 
       rc_this_oldid, 
       rc_last_oldid, 
       rc_type, 
       rc_source, 
       rc_patrolled, 
       rc_ip, 
       rc_old_len, 
       rc_new_len, 
       rc_deleted, 
       rc_logid, 
       rc_log_type, 
       rc_log_action, 
       rc_params, 
       comment_rc_comment.comment_text AS `rc_comment_text`, 
       comment_rc_comment.comment_data AS `rc_comment_data`, 
       comment_rc_comment.comment_id   AS `rc_comment_cid`, 
       actor_rc_user.actor_user        AS `rc_user`, 
       actor_rc_user.actor_name        AS `rc_user_text`, 
       rc_actor, 
       page_latest, 
       (SELECT Group_concat(ctd_name SEPARATOR ',') 
        FROM   `change_tag` 
               JOIN `change_tag_def` 
                 ON (( ct_tag_id = ctd_id )) 
        WHERE  ct_rc_id = rc_id)       AS `ts_tags` 
FROM   `recentchanges` 
       JOIN `comment` `comment_rc_comment` 
         ON (( comment_rc_comment.comment_id = rc_comment_id )) 
       JOIN `actor` `actor_rc_user` 
         ON (( actor_rc_user.actor_id = rc_actor )) 
       LEFT JOIN `page` 
              ON (( rc_cur_id = page_id )) 
WHERE  rc_bot = 0 
       AND ( ( rc_this_oldid = page_latest ) 
              OR rc_type = 3 ) 
       AND ( rc_type != 6 ) 
       AND rc_patrolled = 0 
       AND ( rc_namespace IN ( '12', '120', '146', '640', '828' ) ) 
       AND ( rc_timestamp >= '20191030201706' ) 
       AND rc_new IN ( 0, 1 ) 
ORDER  BY rc_timestamp DESC 
LIMIT  500

At it seems the database choose rc_ns_actor here which sounds not bad:

-- Probably intended for Special:NewPages namespace filter
CREATE INDEX /*i*/rc_ns_actor ON /*_*/recentchanges (rc_namespace, rc_actor);

Maybe in production it use another index without the namespace column.
The selected namespaces here have less pages, so a index without namespace may have to read many rows from the recentchanges table

12 - Help
120 - Property
146 - Lexeme
640 - EntitySchema
828 - Module

Addshore subscribed.

Running the query in the comment above on an analytics slave produced a result in 1 min 12.26 sec
Trying the example URL provided the page loaded after 2 mins.

Looks like a different index would be beneficial here...

The query posted above.

+------+--------------------+--------------------+--------+----------------------------------------------------------------------------------------------------------------------------------------------+----------------------+---------+------------------------------------------+---------+-------------+
| id   | select_type        | table              | type   | possible_keys                                                                                                                                | key                  | key_len | ref                                      | rows    | Extra       |
+------+--------------------+--------------------+--------+----------------------------------------------------------------------------------------------------------------------------------------------+----------------------+---------+------------------------------------------+---------+-------------+
|    1 | PRIMARY            | recentchanges      | range  | rc_timestamp,new_name_timestamp,tmp_2,tmp_3,rc_name_type_patrolled_timestamp,rc_ns_actor,rc_actor,rc_namespace_title_timestamp,rc_this_oldid | rc_timestamp         | 16      | NULL                                     | 9830478 | Using where |
|    1 | PRIMARY            | actor_rc_user      | eq_ref | PRIMARY                                                                                                                                      | PRIMARY              | 8       | wikidatawiki.recentchanges.rc_actor      |       1 |             |
|    1 | PRIMARY            | page               | eq_ref | PRIMARY                                                                                                                                      | PRIMARY              | 4       | wikidatawiki.recentchanges.rc_cur_id     |       1 | Using where |
|    1 | PRIMARY            | comment_rc_comment | eq_ref | PRIMARY                                                                                                                                      | PRIMARY              | 8       | wikidatawiki.recentchanges.rc_comment_id |       1 |             |
|    2 | DEPENDENT SUBQUERY | change_tag         | ref    | change_tag_rc_tag_id,change_tag_tag_id_id                                                                                                    | change_tag_rc_tag_id | 5       | wikidatawiki.recentchanges.rc_id         |       1 | Using index |
|    2 | DEPENDENT SUBQUERY | change_tag_def     | eq_ref | PRIMARY                                                                                                                                      | PRIMARY              | 4       | wikidatawiki.change_tag.ct_tag_id        |       1 |             |
+------+--------------------+--------------------+--------+----------------------------------------------------------------------------------------------------------------------------------------------+----------------------+---------+------------------------------------------+---------+-------------+

With FORCE INDEX (tmp_3) on recentchanges it returns in 3.5 seconds.

+------+--------------------+--------------------+--------+-------------------------------------------+----------------------+---------+------------------------------------------+--------+----------------------------------------------------+
| id   | select_type        | table              | type   | possible_keys                             | key                  | key_len | ref                                      | rows   | Extra
                                     |
+------+--------------------+--------------------+--------+-------------------------------------------+----------------------+---------+------------------------------------------+--------+----------------------------------------------------+
|    1 | PRIMARY            | recentchanges      | range  | tmp_3                                     | tmp_3                | 20      | NULL                                     | 347046 | Using index condition; Using where; Using filesort |
|    1 | PRIMARY            | actor_rc_user      | eq_ref | PRIMARY                                   | PRIMARY              | 8       | wikidatawiki.recentchanges.rc_actor      |      1 |
                                     |
|    1 | PRIMARY            | page               | eq_ref | PRIMARY                                   | PRIMARY              | 4       | wikidatawiki.recentchanges.rc_cur_id     |      1 | Using where
                                     |
|    1 | PRIMARY            | comment_rc_comment | eq_ref | PRIMARY                                   | PRIMARY              | 8       | wikidatawiki.recentchanges.rc_comment_id |      1 |
                                     |
|    2 | DEPENDENT SUBQUERY | change_tag         | ref    | change_tag_rc_tag_id,change_tag_tag_id_id | change_tag_rc_tag_id | 5       | wikidatawiki.recentchanges.rc_id         |      1 | Using index
                                     |
|    2 | DEPENDENT SUBQUERY | change_tag_def     | eq_ref | PRIMARY                                   | PRIMARY              | 4       | wikidatawiki.change_tag.ct_tag_id        |      1 |
                                     |
+------+--------------------+--------------------+--------+-------------------------------------------+----------------------+---------+------------------------------------------+--------+----------------------------------------------------+

Some that are actually provided by mediawiki also work well with USE INDEX:

new_name_timestamp   | 21      | NULL                                     | 369847 | Using index condition; Using where; Using filesort
rc_name_type_patrolled_timestamp          | rc_name_type_patrolled_timestamp | 5       | NULL                                     | 340237 | Using index condition; Using where; Using filesort
rc_namespace_title_timestamp              | rc_namespace_title_timestamp | 4       | NULL                                     | 456781 | Using index condition; Using where; Using filesort

"Using index condition; Using where; Using filesort" that's sad and slow but it works because the range is small.

but I agree timestamp index does make sense for most wikis but not wikidata.

The query I got from tendril is:

SELECT /* SpecialRecentChanges::doMainQuery */ rc_id, rc_timestamp, rc_namespace, rc_title, rc_minor, rc_bot, rc_new, rc_cur_id, rc_this_oldid, rc_last_oldid, rc_type, rc_source, rc_patrolled, rc_ip, rc_old_len, rc_new_len, rc_deleted, rc_logid, rc_log_type, rc_log_action, rc_params, comment_rc_comment.comment_text AS `rc_comment_text`, comment_rc_comment.comment_data AS `rc_comment_data`, comment_rc_comment.comment_id AS `rc_comment_cid`, actor_rc_user.actor_user AS `rc_user`, actor_rc_user.actor_name AS `rc_user_text`, rc_actor, wl_user, wl_notificationtimestamp, page_latest, (SELECT GROUP_CONCAT(ctd_name SEPARATOR ', ') FROM `change_tag` JOIN `change_tag_def` ON ((ct_tag_id=ctd_id)) WHERE ct_rc_id=rc_id ) AS `ts_tags`, ores_damaging_cls.oresc_probability AS `ores_damaging_score`, ores_goodfaith_cls.oresc_probability AS `ores_goodfaith_score` FROM `recentchanges` JOIN `comment` `comment_rc_comment` ON ((comment_rc_comment.comment_id = rc_comment_id)) JOIN `actor` `actor_rc_user` ON ((actor_rc_user.actor_id = rc_actor)) LEFT JOIN `watchlist` ON (wl_user = 28859 AND (wl_title=rc_title) AND (wl_namespace=rc_namespace)) LEFT JOIN `page` ON ((rc_cur_id=page_id)) LEFT JOIN `ores_classification` `ores_damaging_cls` ON (ores_damaging_cls.oresc_model = 11 AND (ores_damaging_cls.oresc_rev=rc_this_oldid) AND ores_damaging_cls.oresc_class = 1) LEFT JOIN `ores_classification` `ores_goodfaith_cls` ON (ores_goodfaith_cls.oresc_model = 12 AND (ores_goodfaith_cls.oresc_rev=rc_this_oldid) AND ores_goodfaith_cls.oresc_class = 1) WHERE rc_bot = 0 AND ((rc_this_oldid = page_latest) OR rc_type IN (3, 142) ) AND (rc_type != 6) AND rc_patrolled = 0 AND (rc_namespace IN ('12', '120', '146', '640', '828')) AND (rc_timestamp >= '20191216180112') AND (rc_source != 'wb') AND rc_new IN (0, 1) ORDER BY rc_timestamp DESC LIMIT 500;

The explain for that is:

*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: recentchanges
         type: range
possible_keys: rc_timestamp,new_name_timestamp,rc_name_type_patrolled_timestamp,rc_ns_actor,rc_actor,rc_namespace_title_timestamp,rc_this_oldid
          key: rc_timestamp
      key_len: 16
          ref: NULL
         rows: 10704405
        Extra: Using where
*************************** 2. row ***************************
           id: 1
  select_type: PRIMARY
        table: watchlist
         type: eq_ref
possible_keys: wl_user,namespace_title,wl_user_notificationtimestamp
          key: wl_user
      key_len: 265
          ref: const,wikidatawiki.recentchanges.rc_namespace,wikidatawiki.recentchanges.rc_title
         rows: 1
        Extra: 
*************************** 3. row ***************************
           id: 1
  select_type: PRIMARY
        table: actor_rc_user
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 8
          ref: wikidatawiki.recentchanges.rc_actor
         rows: 1
        Extra: 
*************************** 4. row ***************************
           id: 1
  select_type: PRIMARY
        table: page
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: wikidatawiki.recentchanges.rc_cur_id
         rows: 1
        Extra: Using where
*************************** 5. row ***************************
           id: 1
  select_type: PRIMARY
        table: ores_damaging_cls
         type: eq_ref
possible_keys: oresc_rev_model_class
          key: oresc_rev_model_class
      key_len: 7
          ref: wikidatawiki.recentchanges.rc_this_oldid,const,const
         rows: 1
        Extra: 
*************************** 6. row ***************************
           id: 1
  select_type: PRIMARY
        table: ores_goodfaith_cls
         type: eq_ref
possible_keys: oresc_rev_model_class
          key: oresc_rev_model_class
      key_len: 7
          ref: wikidatawiki.recentchanges.rc_this_oldid,const,const
         rows: 1
        Extra: 
*************************** 7. row ***************************
           id: 1
  select_type: PRIMARY
        table: comment_rc_comment
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 8
          ref: wikidatawiki.recentchanges.rc_comment_id
         rows: 1
        Extra: 
*************************** 8. row ***************************
           id: 2
  select_type: DEPENDENT SUBQUERY
        table: change_tag
         type: ref
possible_keys: change_tag_rc_tag_id,change_tag_tag_id_id
          key: change_tag_rc_tag_id
      key_len: 5
          ref: wikidatawiki.recentchanges.rc_id
         rows: 1
        Extra: Using index
*************************** 9. row ***************************
           id: 2
  select_type: DEPENDENT SUBQUERY
        table: change_tag_def
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: wikidatawiki.change_tag.ct_tag_id
         rows: 1
        Extra: 
9 rows in set (0.00 sec)
`

I put a formatted version of the query in the comment above in P10177

1SELECT /* SpecialRecentChanges::doMainQuery */
2rc_id,
3rc_timestamp,
4rc_namespace,
5rc_title,
6rc_minor,
7rc_bot,
8rc_new,
9rc_cur_id,
10rc_this_oldid,
11rc_last_oldid,
12rc_type,
13rc_source,
14rc_patrolled,
15rc_ip,
16rc_old_len,
17rc_new_len,
18rc_deleted,
19rc_logid,
20rc_log_type,
21rc_log_action,
22rc_params,
23comment_rc_comment.comment_text AS `rc_comment_text`,
24comment_rc_comment.comment_data AS `rc_comment_data`,
25comment_rc_comment.comment_id AS `rc_comment_cid`,
26actor_rc_user.actor_user AS `rc_user`,
27actor_rc_user.actor_name AS `rc_user_text`,
28rc_actor,
29wl_user,
30wl_notificationtimestamp,
31page_latest,
32(SELECT Group_concat(ctd_name SEPARATOR ', ')
33 FROM `change_tag`
34 JOIN `change_tag_def`
35 ON (( ct_tag_id = ctd_id ))
36 WHERE ct_rc_id = rc_id) AS `ts_tags`,
37ores_damaging_cls.oresc_probability AS `ores_damaging_score`,
38ores_goodfaith_cls.oresc_probability AS `ores_goodfaith_score`
39FROM `recentchanges`
40 JOIN `comment` `comment_rc_comment`
41 ON (( comment_rc_comment.comment_id = rc_comment_id ))
42 JOIN `actor` `actor_rc_user`
43 ON (( actor_rc_user.actor_id = rc_actor ))
44 LEFT JOIN `watchlist`
45 ON ( wl_user = 28859
46 AND ( wl_title = rc_title )
47 AND ( wl_namespace = rc_namespace ) )
48 LEFT JOIN `page`
49 ON (( rc_cur_id = page_id ))
50 LEFT JOIN `ores_classification` `ores_damaging_cls`
51 ON ( ores_damaging_cls.oresc_model = 11
52 AND ( ores_damaging_cls.oresc_rev = rc_this_oldid )
53 AND ores_damaging_cls.oresc_class = 1 )
54 LEFT JOIN `ores_classification` `ores_goodfaith_cls`
55 ON ( ores_goodfaith_cls.oresc_model = 12
56 AND ( ores_goodfaith_cls.oresc_rev = rc_this_oldid )
57 AND ores_goodfaith_cls.oresc_class = 1 )
58WHERE rc_bot = 0
59 AND ( ( rc_this_oldid = page_latest )
60 OR rc_type IN ( 3, 142 ) )
61 AND ( rc_type != 6 )
62 AND rc_patrolled = 0
63 AND ( rc_namespace IN ( '12', '120', '146', '640', '828' ) )
64 AND ( rc_timestamp >= '20191216180112' )
65 AND ( rc_source != 'wb' )
66 AND rc_new IN ( 0, 1 )
67ORDER BY rc_timestamp DESC
68LIMIT 500;

I guess this is up to product to see if this is important enough to be tackled now that an initial tech investigation has been done!

Recentchanges table in wikidata is growing to a problematic degree. It's not wb_terms level problematic but it's quite big.

This table is the denomarlization table in mediawiki, so I don't think we should normalize it but we can look at other options, mostly improving the size horizontally instead of vertically:

  • Reduce max age for rc table in wikidata to 20 days or two weeks
  • Split it to two tables: "recentchanges" and "recentchanges_bots". While still a lot of edits bot in wikidata are not marked as bot but this still would help a lot.
  • Prepend all of indexes with rc_bot and make sure all queries set the rc_bot. This is a trick explained in depth in High Performance MySQL book. Basically saying if you have the field that you know is being set in all queries, while it might not have great cardinality, it's still better to put it first.
    • Possibly we can just add all the new indexes for wikidata but keep it in mind rc table already has ten indexes.

Wonder what people think her.

Manuel closed this task as Declined.EditedJun 23 2022, 9:05 AM
Manuel subscribed.

There are still some timeouts and we are trying to work on it via T307328: Scalability issues of recentchanges table. The new task supersedes this task, so I am closing this one.