Page MenuHomePhabricator

Database query error occurs when visiting Special:RecentChanges&tagfilter=wikieditor
Closed, ResolvedPublicSecurity

Description

Behavior

  1. Visit en:Special:RecentChanges&tagfilter=wikieditor

Actual

  1. ❗️The following error appears:
[4b6e642b-1b06-4d6d-8444-d22a0051aafa] 2021-12-23 02:57:01: Fatal exception of type "Wikimedia\Rdbms\DBQueryError"

Expected

  1. ✅A list of edits that have been tagged with the wikieditor change tag (T249038) appear

Done

  • Expected behavior is implemented

2021-12-23 02:57:01 [4b6e642b-1b06-4d6d-8444-d22a0051aafa] mw1405 enwiki 1.38.0-wmf.13 exception ERROR: [4b6e642b-1b06-4d6d-8444-d22a0051aafa] /w/index.php?title=Special:RecentChanges&tagfilter=wikieditor   Wikimedia\Rdbms\DBQueryError: Error 1969: Query execution was interrupted (max_statement_time exceeded) (db1099:3311)
Function: SpecialRecentChanges::doMainQuery
Query: SET STATEMENT max_statement_time=30 FOR SELECT  rc_id,rc_timestamp,rc_namespace,rc_title,rc_minor,rc_bot,rc_new,rc_cur_id,rc_this_oldid,rc_last_oldid,rc_type,rc_source,rc_patrolled,rc_ip,rc_old_len,rc_new_len,rc_deleted,rc_logid,rc_log_type,rc_log_action,rc_params,rc_actor,recentchanges_actor.actor_user AS `rc_user`,recentchanges_actor.actor_name AS `rc_user_text`,comment_rc_comment.comment_text AS `rc_comment_text`,comment_rc_comment.comment_data AS `rc_comment_data`,comment_rc_comment.comment_id AS `rc_comment_cid`,wl_user,wl_notificationtimestamp,we_expiry,page_latest,(SELECT  GROUP_CONCAT(ctd_name SEPARATOR ',')  FROM `change_tag` JOIN `change_tag_def` ON ((ct_tag_id=ctd_id))   WHERE ct_rc_id=rc_id  ) AS `ts_tags`,fp_stable,fp_pending_since,ores_damaging_cls.oresc_probability AS `ores_damaging_score`,ores_goodfaith_cls.oresc_probability AS `ores_goodfaith_score`  FROM `recentchanges` JOIN `actor` `recentchanges_actor` ON ((actor_id=rc_actor)) JOIN `comment` `comment_rc_comment` ON ((comment_rc_comment.comment_id = rc_comment_id)) LEFT JOIN `watchlist` ON (wl_user = 35724847 AND (wl_title=rc_title) AND (wl_namespace=rc_namespace)) LEFT JOIN `watchlist_expiry` ON ((wl_id = we_item)) LEFT JOIN `page` ON ((rc_cur_id=page_id)) JOIN `change_tag` ON ((ct_rc_id=rc_id)) LEFT JOIN `flaggedpages` ON ((fp_page_id = rc_cur_id)) LEFT JOIN `ores_classification` `ores_damaging_cls` ON (ores_damaging_cls.oresc_model = 59 AND (ores_damaging_cls.oresc_rev=rc_this_oldid) AND ores_damaging_cls.oresc_class = 1) LEFT JOIN `ores_classification` `ores_goodfaith_cls` ON (ores_goodfaith_cls.oresc_model = 60 AND (ores_goodfaith_cls.oresc_rev=rc_this_oldid) AND ores_goodfaith_cls.oresc_class = 1)   WHERE rc_bot = 0 AND (rc_type != 6) AND (rc_source != 'wb') AND (rc_timestamp >= '20211123025630') AND ct_tag_id = 616 AND rc_new IN (0,1)   ORDER BY rc_timestamp DESC LIMIT 500  
 {"exception_url":"/w/index.php?title=Special:RecentChanges&tagfilter=wikieditor","reqId":"4b6e642b-1b06-4d6d-8444-d22a0051aafa","caught_by":"entrypoint"} 
[Exception Wikimedia\Rdbms\DBQueryError] (/srv/mediawiki/php-1.38.0-wmf.13/includes/libs/rdbms/database/Database.php:1817) Error 1969: Query execution was interrupted (max_statement_time exceeded) (db1099:3311)
Function: SpecialRecentChanges::doMainQuery
Query: SET STATEMENT max_statement_time=30 FOR SELECT  rc_id,rc_timestamp,rc_namespace,rc_title,rc_minor,rc_bot,rc_new,rc_cur_id,rc_this_oldid,rc_last_oldid,rc_type,rc_source,rc_patrolled,rc_ip,rc_old_len,rc_new_len,rc_deleted,rc_logid,rc_log_type,rc_log_action,rc_params,rc_actor,recentchanges_actor.actor_user AS `rc_user`,recentchanges_actor.actor_name AS `rc_user_text`,comment_rc_comment.comment_text AS `rc_comment_text`,comment_rc_comment.comment_data AS `rc_comment_data`,comment_rc_comment.comment_id AS `rc_comment_cid`,wl_user,wl_notificationtimestamp,we_expiry,page_latest,(SELECT  GROUP_CONCAT(ctd_name SEPARATOR ',')  FROM `change_tag` JOIN `change_tag_def` ON ((ct_tag_id=ctd_id))   WHERE ct_rc_id=rc_id  ) AS `ts_tags`,fp_stable,fp_pending_since,ores_damaging_cls.oresc_probability AS `ores_damaging_score`,ores_goodfaith_cls.oresc_probability AS `ores_goodfaith_score`  FROM `recentchanges` JOIN `actor` `recentchanges_actor` ON ((actor_id=rc_actor)) JOIN `comment` `comment_rc_comment` ON ((comment_rc_comment.comment_id = rc_comment_id)) LEFT JOIN `watchlist` ON (wl_user = 35724847 AND (wl_title=rc_title) AND (wl_namespace=rc_namespace)) LEFT JOIN `watchlist_expiry` ON ((wl_id = we_item)) LEFT JOIN `page` ON ((rc_cur_id=page_id)) JOIN `change_tag` ON ((ct_rc_id=rc_id)) LEFT JOIN `flaggedpages` ON ((fp_page_id = rc_cur_id)) LEFT JOIN `ores_classification` `ores_damaging_cls` ON (ores_damaging_cls.oresc_model = 59 AND (ores_damaging_cls.oresc_rev=rc_this_oldid) AND ores_damaging_cls.oresc_class = 1) LEFT JOIN `ores_classification` `ores_goodfaith_cls` ON (ores_goodfaith_cls.oresc_model = 60 AND (ores_goodfaith_cls.oresc_rev=rc_this_oldid) AND ores_goodfaith_cls.oresc_class = 1)   WHERE rc_bot = 0 AND (rc_type != 6) AND (rc_source != 'wb') AND (rc_timestamp >= '20211123025630') AND ct_tag_id = 616 AND rc_new IN (0,1)   ORDER BY rc_timestamp DESC LIMIT 500  

  #0 /srv/mediawiki/php-1.38.0-wmf.13/includes/libs/rdbms/database/Database.php(1801): Wikimedia\Rdbms\Database->getQueryException(string, integer, string, string)
  #1 /srv/mediawiki/php-1.38.0-wmf.13/includes/libs/rdbms/database/Database.php(1776): Wikimedia\Rdbms\Database->getQueryExceptionAndLog(string, integer, string, string)
  #2 /srv/mediawiki/php-1.38.0-wmf.13/includes/libs/rdbms/database/Database.php(1310): Wikimedia\Rdbms\Database->reportQueryError(string, integer, string, string, boolean)
  #3 /srv/mediawiki/php-1.38.0-wmf.13/includes/libs/rdbms/database/Database.php(2020): Wikimedia\Rdbms\Database->query(string, string, integer)
  #4 /srv/mediawiki/php-1.38.0-wmf.13/includes/libs/rdbms/database/DBConnRef.php(68): Wikimedia\Rdbms\Database->select(array, array, array, string, array, array)
  #5 /srv/mediawiki/php-1.38.0-wmf.13/includes/libs/rdbms/database/DBConnRef.php(307): Wikimedia\Rdbms\DBConnRef->__call(string, array)
  #6 /srv/mediawiki/php-1.38.0-wmf.13/includes/specials/SpecialRecentChanges.php(410): Wikimedia\Rdbms\DBConnRef->select(array, array, array, string, array, array)
  #7 /srv/mediawiki/php-1.38.0-wmf.13/includes/specialpage/ChangesListSpecialPage.php(1018): SpecialRecentChanges->doMainQuery(array, array, array, array, array, FormOptions)
  #8 /srv/mediawiki/php-1.38.0-wmf.13/includes/specialpage/ChangesListSpecialPage.php(617): ChangesListSpecialPage->getRows()
  #9 /srv/mediawiki/php-1.38.0-wmf.13/includes/specials/SpecialRecentChanges.php(199): ChangesListSpecialPage->execute(NULL)
  #10 /srv/mediawiki/php-1.38.0-wmf.13/includes/specialpage/SpecialPage.php(671): SpecialRecentChanges->execute(NULL)
  #11 /srv/mediawiki/php-1.38.0-wmf.13/includes/specialpage/SpecialPageFactory.php(1377): SpecialPage->run(NULL)
  #12 /srv/mediawiki/php-1.38.0-wmf.13/includes/MediaWiki.php(314): MediaWiki\SpecialPage\SpecialPageFactory->executePath(string, RequestContext)
  #13 /srv/mediawiki/php-1.38.0-wmf.13/includes/MediaWiki.php(903): MediaWiki->performRequest()
  #14 /srv/mediawiki/php-1.38.0-wmf.13/includes/MediaWiki.php(563): MediaWiki->main()
  #15 /srv/mediawiki/php-1.38.0-wmf.13/index.php(53): MediaWiki->run()
  #16 /srv/mediawiki/php-1.38.0-wmf.13/index.php(46): wfIndexMain()
  #17 /srv/mediawiki/w/index.php(3): require(string)
  #18 {main}

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

It worked fine for me just now, although the page seemed to load quite slowly, but then I’m on my phone. Maybe the large number of wikieditor tags is resulting in a weird query plan and causing the query to timeout. I’ll have a look when I’m home, adding folks in case they can have a look earlier (sorry).

Reedy changed the subtype of this task from "Task" to "Production Error".
Reedy updated the task description. (Show Details)

The explain of the query that @Reedy recorded in the description from a prod replica:

idselect_typetabletypepossible_keyskeykey_lenrefrowsExtra
1PRIMARYchange_tagrefct_rc_tag_id,ct_tag_id_idct_tag_id_id4const806340Using where; Using index; Using temporary; Using filesort
1PRIMARYrecentchangeseq_refPRIMARY,rc_timestamp,rc_actor,rc_new_name_timestampPRIMARY4enwiki.change_tag.ct_rc_id1Using where
1PRIMARYflaggedpageseq_refPRIMARYPRIMARY4enwiki.recentchanges.rc_cur_id1
1PRIMARYwatchlisteq_refwl_user,wl_user_notificationtimestamp,wl_namespace_titlewl_user265const,enwiki.recentchanges.rc_namespace,enwiki.recentchanges.rc_title1
1PRIMARYwatchlist_expiryeq_refPRIMARYPRIMARY4enwiki.watchlist.wl_id1Using where
1PRIMARYores_damaging_clseq_reforesc_rev_model_classoresc_rev_model_class7enwiki.recentchanges.rc_this_oldid,const,const1
1PRIMARYores_goodfaith_clseq_reforesc_rev_model_classoresc_rev_model_class7enwiki.recentchanges.rc_this_oldid,const,const1
1PRIMARYpageeq_refPRIMARYPRIMARY4enwiki.recentchanges.rc_cur_id1
1PRIMARYrecentchanges_actoreq_refPRIMARYPRIMARY8enwiki.recentchanges.rc_actor1
1PRIMARYcomment_rc_commenteq_refPRIMARYPRIMARY8enwiki.recentchanges.rc_comment_id1
2DEPENDENT SUBQUERYchange_tagrefct_rc_tag_id,ct_tag_id_idct_rc_tag_id5enwiki.recentchanges.rc_id1Using index
2DEPENDENT SUBQUERYchange_tag_defeq_refPRIMARYPRIMARY4enwiki.change_tag.ct_tag_id1
Legoktm set Security to Software security bug.Dec 23 2021, 5:11 PM
Legoktm added projects: Security, Security-Team.
Legoktm changed the visibility from "Public (No Login Required)" to "Custom Policy".
Legoktm changed the subtype of this task from "Production Error" to "Security Issue".
Legoktm added a subscriber: Legoktm.

Making private since it's a trivial DoS.

Proposed patch to temporarily disable these queries:
(Special:RecentChanges and anything else trying to find edits with this tag will return 0 edits)

The explain of the query that @Reedy recorded in the description from a prod replica:

The problem here is that change_tag is queried first instead of recentchanges. The plan probably gets flipped because change_tag has so many rows for this tag (~50% of all edits are tagged with 'wikieditor'). Adding a straight join so that recentchanges is queried first looks promising (testing on analytics replicas, the query goes from 14s to 0.3s).

Change 749760 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/core@master] Disable querying the 'wikieditor' change tag temporarily

https://gerrit.wikimedia.org/r/749760

Change 749766 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/core@wmf/1.38.0-wmf.13] Disable querying the 'wikieditor' change tag temporarily

https://gerrit.wikimedia.org/r/749766

There are two problems at the same time:

  • The page fatals and does not respond to you: This is intentional. Slow queries since last week are getting killed more aggressively (as the ticket pointed out: T297708) as a reliability and DDoS prevention measure. It will have a full rollout in first week of Jan. If you need this data, instead of loading it in the mediawiki. Run it as a query in analytics cluster or quarry.wmcloud.org
  • The query being slow: This is something we should fix. Change tag needs some clean up and recentchanges. The current system needs a bit of thinking.

The current fix is good for now but I'll look how to improve this.

Change 749766 merged by jenkins-bot:

[mediawiki/core@wmf/1.38.0-wmf.13] Disable querying the 'wikieditor' change tag temporarily

https://gerrit.wikimedia.org/r/749766

Legoktm changed the visibility from "Custom Policy" to "Public (No Login Required)".Dec 23 2021, 6:16 PM

Making this public again since it's disabled in production for now. It's not merged into master so it's a train blocker for wmf.16...hopefully we can have something better by then :)

Jdforrester-WMF triaged this task as Unbreak Now! priority.Dec 23 2021, 6:30 PM
Jdforrester-WMF added a subscriber: Jdforrester-WMF.

Train blocker -> UBN!

This ticket is giving me a lot of deja vu. Read from here: T277416#7333359 (let me know if you want access)

We have this problem that the join order in the filtering has to determined by how often the tag is used. If it's used quite often (let's say >10% of the edits have this tag) then rc table should be queried first, and the other way around if it's not common. Finding if a tag is common or not is not easy either (depending on the hitcount in change_tag_def wouldn't work on new heavily used tags).

There are multiple ways to tackle the problem:

  • Fully redesign change tag table. Possibly split it to several tables. One for logs, one for revision, and one for rc or a denormalization table (or just another column in rc). I'm not sure it would fix this issue we have here at hand (maybe it makes it easier to implement it)
  • A simple trick for rc queries for now, get lowest rc_id possible and put a where condition to have at least ct_rc_id above that. That would reduce the scan to at least 30 days ago.
  • Add timestamp column to change_tag table. It comes with massive baggage of storage space. This table has 1B rows in wikidata, adding a bianry(14) column on this gonna bloat it (maybe it's inevitable)?

Should this task continue blocking the train and if so, how serious of a blocker? Should we continue with group0 rollout and deal with this as Thursday approaches or does this need to block group 0/1?

Change 751544 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/core@wmf/1.38.0-wmf.16] Disable querying the 'wikieditor' change tag temporarily

https://gerrit.wikimedia.org/r/751544

@mmodell Let's merge the above backport and then we can roll out safely everywhere.

Change 751544 merged by jenkins-bot:

[mediawiki/core@wmf/1.38.0-wmf.16] Disable querying the 'wikieditor' change tag temporarily

https://gerrit.wikimedia.org/r/751544

It's just a query planning error. If you force the join order then it's fast.

A change tag that's present for 50% of rows should not be a performance problem. You can traverse recentchanges first, ordering by rc_timestamp, and filter out non-matching rows. If the proportion of rows that have the tag is p, then the number of scanned rows is L/p where L is the query limit. So with L=500 as in the bug, if the tag is on 50% of rows, you have to scan 1000 rows. This is not a problem.

For a very rare change tag, it is cheaper to scan the change_tag table first and to filesort on rc_timestamp. The cost of doing this is the number of rows with the tag, i.e. Np where N is the total number of rows in the table. On enwiki, N=9661922.

It follows that the threshold for changing the table order should be L/p = Np ⇒ p = sqrt(L/N). For enwiki with a limit of 500 that is 0.7%, where the number of scanned rows in either query plan is 69505.

MariaDB is not doing anything like this. It apparently ignores the limit and just compares the set sizes.

I think estimateRowCount() can be used to choose decide whether to set the STRAIGHT_JOIN option. I tried a few EXPLAIN SELECT queries and they're only out by a factor of 2 or so, which should be good enough.

MariaDB [enwiki]> explain select * from change_tag where ct_tag_id=565 and ct_rc_id>(select min(rc_id) from recentchanges);
+------+-------------+------------+-------+---------------------------+--------------+---------+------+-------+------------------------------------+
| id   | select_type | table      | type  | possible_keys             | key          | key_len | ref  | rows  | Extra                              |
+------+-------------+------------+-------+---------------------------+--------------+---------+------+-------+------------------------------------+
|    1 | PRIMARY     | change_tag | range | ct_rc_tag_id,ct_tag_id_id | ct_tag_id_id | 9       | NULL | 30900 | Using index condition; Using where |
|    2 | SUBQUERY    | NULL       | NULL  | NULL                      | NULL         | NULL    | NULL | NULL  | Select tables optimized away       |
+------+-------------+------------+-------+---------------------------+--------------+---------+------+-------+------------------------------------+

Using estimateRowCount() sounds good to me. I think we should use it more widely.

Change 752027 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[mediawiki/core@master] Add per-table straight join option

https://gerrit.wikimedia.org/r/752027

Change 752047 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[mediawiki/core@master] Use STRAIGHT_JOIN when filtering on frequently used tags

https://gerrit.wikimedia.org/r/752047

Change 752027 merged by jenkins-bot:

[mediawiki/core@master] Add per-table straight join option

https://gerrit.wikimedia.org/r/752027

Change 752047 merged by jenkins-bot:

[mediawiki/core@master] Use STRAIGHT_JOIN when filtering on frequently used tags

https://gerrit.wikimedia.org/r/752047

matmarex added a project: Editing QA.

Thank you for swooping in and fixing this for us @tstarling.

I'm not sure who owns this task now (and is supposed to close it), but I'm assuming it's us.

Change 749760 abandoned by Bartosz Dziewoński:

[mediawiki/core@master] Disable querying the 'wikieditor' change tag temporarily

Reason:

No longer needed

https://gerrit.wikimedia.org/r/749760

Reedy moved this task from Incoming to Watching on the Security-Team board.

I'm not sure who owns this task now (and is supposed to close it), but I'm assuming it's us.

Whoever wants to own it ;)

Can potentially be made public (soon), but also need to decide if we're going to backport to 1.35/1.36/1.37

Can potentially be made public (soon)

it is public

@matmarex: For the broken version, is the error indicated in step 2 expected to show up in the browser console?

It's a PHP error, so it would display in the normal window, and basically replace the whole page.

Thanks @matmarex. The expected behaviour has been implemented.

Screenshot 2022-01-13 at 21.15.45.png (1×2 px, 496 KB)