Page MenuHomePhabricator

long queries in Extension ApprovedRevs
Closed, ResolvedPublic

Description

ApprovedRevs generates queries like the following:

SELECT /*  52.1.214.79 */  r.rev_user_text  FROM `revision` `r`,`page` `p`    WHERE p.page_title = 'Former_Assize_Court,_Devizes.jpg'  ORDER BY r.rev_id ASC LIMIT 1;

102GB database on system with 128GB RAM and 48CPUs, this query took 17 seconds
Serving 100 page views per minute, with ~300 concurrent users, CPU went from 100% down to <10% after disabling ApprovedRevs

# Time: 190602 12:31:10
# User@Host: wiki_app_user[wiki_app_user] @ localhost []
# Thread_id: 34081  Schema: wiki_en  QC_hit: No
# Query_time: 17.937014  Lock_time: 0.000045  Rows_sent: 1  Rows_examined: 6754801
# Rows_affected: 0

Event Timeline

Which specific ApprovedRevs version on which specific MediaWiki version is this about? What are specific steps to reproduce?

MediaWiki 1.32.1 (4012d8a)
PHP 7.1.29-1+0~20190503101539.18+stretch~1.gbp946c98 (apache2handler)
MariaDB 10.1.38-MariaDB-0+deb9u1
ICU 64.1
Elasticsearch 5.6.16
Lua 5.1.5

Approved Revs v. 251cf4db3ab99366fa281ea535e725672fa2b6c3 (git rev-parse HEAD)

This problem didn't manifest until we went into production where there is more traffic compared to our development /QA environment. Somehow the long queries start to pile up and the kernel starts switching contexts until it's spending all CPU shuffling work while waiting for the long-running queries. The ApprovedRevs queries can be found in the MySQL slow query log if the threshold is set to 5 seconds or more because these queries were all in the range of 17-30+ seconds. Although the tables in question have indexes, the problematic queries end up scanning the entire tables with millions of rows.

explain SELECT /* 52.1.214.79 */ r.rev_user_text FROM revision r,page p WHERE p.page_title = 'Geographylogo.png' ORDER BY r.rev_id ASC LIMIT 1\G

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: r
         type: index
possible_keys: NULL
          key: usertext_timestamp
      key_len: 811
          ref: NULL
         rows: 2908256
        Extra: Using index; Using temporary; Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: p
         type: index
possible_keys: NULL
          key: name_title
      key_len: 771
          ref: NULL
         rows: 253420
        Extra: Using where; Using index; Using join buffer (flat, BNL join)

You can see the source with grep -C20 --color rev_user_text $IP/extensions/ApprovedRevs/includes/ApprovedRevs_body.php or see https://github.com/wikimedia/mediawiki-extensions-ApprovedRevs/blob/master/includes/ApprovedRevs_body.php#L356-L372

Mentioned in SAL (#wikimedia-operations) [2020-09-30T18:31:51Z] <catrope@deploy1001> Synchronized wmf-config/InitialiseSettings.php: Enable GrowthExperiments for newcomers on ptwiki (T225027) (duration: 00m 58s)

This issue persists in MW 1.34.4 / ApprovedRevs version 'master'

Product Version
MediaWiki 1.34.4 (e34e7f2)
12:19, 24 September 2020
PHP 7.2.33-1+0~20200807.47+debian9~1.gbpcb3068 (apache2handler)
MariaDB 10.1.45-MariaDB-0+deb9u1
ICU 65.1
Elasticsearch 6.8.12

I believe the problem is due to the JOIN conditions being discarded/unrecognized due to table aliases -- although I have not had a chance to test this. You can see in the code that there is a JOIN condition, but it's not reflected in the actual queries seen on the wiki.

Actual queries seen on wiki look like this (taking 24 seconds):

SELECT /*   */  r.rev_user_text  FROM `revision` `r`,`page` `p`    WHERE p.page_title = 'KansasMarionCourthouse.jpg'  ORDER BY r.rev_id ASC LIMIT 1;
# Time: 200928 22:48:57
# User@Host: wiki_app_user[wiki_app_user] @ localhost []
# Thread_id: 315456  Schema: wiki_en  QC_hit: No
# Query_time: 24.755784  Lock_time: 0.000029  Rows_sent: 1  Rows_examined: 7653097
# Rows_affected: 0
SET timestamp=1601333337;

Using Explain to look at the query plan, you can see that one is a full join of both tables
EXPLAIN SELECT r.rev_user_text FROM revision r,page p WHERE p.page_title = 'Spain_Murcia.jpg' ORDER BY r.rev_id ASC LIMIT 1;

# id, select_type, table, type, possible_keys, key, key_len, ref, rows, Extra
1, SIMPLE, r, index, , usertext_timestamp, 811, , 3869507, Using index; Using temporary; Using filesort
1, SIMPLE, p, index, , name_title, 771, , 273942, Using where; Using index; Using join buffer (flat, BNL join)

Whereas using an actual JOIN generates a much faster query plan that returns almost immediately
EXPLAIN SELECT r.rev_user_text FROM revision r INNER JOIN page p ON r.rev_page = p.page_id WHERE p.page_title = 'Spain_Murcia.jpg' ORDER BY r.rev_id ASC; -- LIMIT 1;

# id, select_type, table, type, possible_keys, key, key_len, ref, rows, Extra
1, SIMPLE, p, index, PRIMARY, name_title, 771, , 273942, Using where; Using index; Using temporary; Using filesort
1, SIMPLE, r, ref, PRIMARY,page_timestamp,page_user_timestamp, PRIMARY, 4, wiki_en.p.page_id, 6,
freephile renamed this task from long queries to long queries in Extension ApprovedRevs.Sep 30 2020, 6:59 PM
freephile added subscribers: Hexmode, Yaron_Koren.

Change 631260 had a related patch set uploaded (by Reedy; owner: Reedy):
[mediawiki/extensions/ApprovedRevs@master] Fix table join when table is aliased

https://gerrit.wikimedia.org/r/631260

> echo $dbr->selectSQLText( [ 'r' => 'revision', 'p' => 'page' ], 'r.rev_user_text', [ 'p.page_title' => 'Foo' ], null, [ 'ORDER BY' => 'r.rev_id ASC' ], [ 'revision' => [ 'JOIN', [ 'r.rev_page = p.page_id' ] ] ] );
SELECT  r.rev_user_text  FROM `mw_revision` `r`,`mw_page` `p`    WHERE p.page_title = 'Foo'  ORDER BY r.rev_id ASC

vs

> echo $dbr->selectSQLText( [ 'r' => 'revision', 'p' => 'page' ], 'r.rev_user_text', [ 'p.page_title' => 'Foo' ], null, [ 'ORDER BY' => 'r.rev_id ASC' ], [ 'r' => [ 'JOIN', [ 'r.rev_page = p.page_id' ] ] ] );
SELECT  r.rev_user_text  FROM `mw_page` `p` JOIN `mw_revision` `r` ON ((r.rev_page = p.page_id))   WHERE p.page_title = 'Foo'  ORDER BY r.rev_id ASC

Change 631260 merged by jenkins-bot:
[mediawiki/extensions/ApprovedRevs@master] Fix table join when table is aliased

https://gerrit.wikimedia.org/r/631260

Reedy claimed this task.