Page MenuHomePhabricator

Query for Special:Contributions?contribs=newbie&dir=prev is slow on enwiki
Closed, ResolvedPublic

Description

The query for Special:Contributions?contribs=newbie&dir=prev is slow on enwiki (>25 seconds), occasionally resulting in database query errors (IndexPager::buildQueryInfo (contributions page unfiltered)).

These are slow:

This is fast:

Event Timeline

matmarex created this task.Jul 16 2016, 2:37 PM
Restricted Application added subscribers: Zppix, Aklapper. · View Herald TranscriptJul 16 2016, 2:37 PM

The slow queries are:

SELECT rev_id,rev_page,rev_text_id,rev_timestamp,rev_comment,rev_user_text,rev_user,rev_minor_edit,rev_deleted,rev_len,rev_parent_id,rev_sha1,rev_content_format,rev_content_model,user_name,page_namespace,page_title,page_is_new,page_latest,page_is_redirect,page_len,(SELECT  GROUP_CONCAT(ct_tag SEPARATOR ',')  FROM `change_tag`   WHERE ct_rev_id=rev_id  ) AS `ts_tags`
FROM `revision`
INNER JOIN `page` ON ((page_id = rev_page))
LEFT JOIN `user` ON ((rev_user != 0) AND (user_id = rev_user))
LEFT JOIN `user_groups` ON ((ug_user = rev_user) AND ug_group = 'bot')
WHERE (rev_user > 28488973) AND (ug_group IS NULL) AND ((rev_deleted & 12) != 12)
AND (rev_timestamp<'20160604')
ORDER BY rev_timestamp DESC LIMIT 51;
+------+--------------------+-------------+--------+------------------------------------------------------------------------+---------------+---------+--------------------------------+-------+--------------------------------------+
| id   | select_type        | table       | type   | possible_keys                                                          | key           | key_len | ref                            | rows  | Extra                                |
+------+--------------------+-------------+--------+------------------------------------------------------------------------+---------------+---------+--------------------------------+-------+--------------------------------------+
|    1 | PRIMARY            | revision    | index  | PRIMARY,rev_timestamp,page_timestamp,user_timestamp,rev_timestamp_user | rev_timestamp | 16      | NULL                           | 84570 | Using where                          |
|    1 | PRIMARY            | user        | eq_ref | PRIMARY                                                                | PRIMARY       | 4       | enwiki.revision.rev_user       |     1 | Using where                          |
|    1 | PRIMARY            | user_groups | eq_ref | PRIMARY,ug_group                                                       | PRIMARY       | 261     | enwiki.revision.rev_user,const |     1 | Using where; Using index; Not exists |
|    1 | PRIMARY            | page        | eq_ref | PRIMARY                                                                | PRIMARY       | 4       | enwiki.revision.rev_page       |     1 |                                      |
|    2 | DEPENDENT SUBQUERY | change_tag  | ref    | ct_rev_id,change_tag_rev_tag                                           | ct_rev_id     | 5       | enwiki.revision.rev_id         |     1 | Using where; Using index             |
+------+--------------------+-------------+--------+------------------------------------------------------------------------+---------------+---------+--------------------------------+-------+--------------------------------------+
SELECT rev_id,rev_page,rev_text_id,rev_timestamp,rev_comment,rev_user_text,rev_user,rev_minor_edit,rev_deleted,rev_len,rev_parent_id,rev_sha1,rev_content_format,rev_content_model,user_name,page_namespace,page_title,page_is_new,page_latest,page_is_redirect,page_len,(SELECT  GROUP_CONCAT(ct_tag SEPARATOR ',')  FROM `change_tag`   WHERE ct_rev_id=rev_id  ) AS `ts_tags`
FROM `revision`
INNER JOIN `page` ON ((page_id = rev_page))
LEFT JOIN `user` ON ((rev_user != 0) AND (user_id = rev_user))
LEFT JOIN `user_groups` ON ((ug_user = rev_user) AND ug_group = 'bot')
WHERE (rev_user > 28488973) AND (ug_group IS NULL) AND ((rev_deleted & 12) != 12)
ORDER BY rev_timestamp LIMIT 51;
+------+--------------------+-------------+--------+---------------------------------------+---------------+---------+--------------------------------+-------+--------------------------------------+
| id   | select_type        | table       | type   | possible_keys                         | key           | key_len | ref                            | rows  | Extra                                |
+------+--------------------+-------------+--------+---------------------------------------+---------------+---------+--------------------------------+-------+--------------------------------------+
|    1 | PRIMARY            | revision    | index  | PRIMARY,page_timestamp,user_timestamp | rev_timestamp | 16      | NULL                           | 84570 | Using where                          |
|    1 | PRIMARY            | user        | eq_ref | PRIMARY                               | PRIMARY       | 4       | enwiki.revision.rev_user       |     1 | Using where                          |
|    1 | PRIMARY            | user_groups | eq_ref | PRIMARY,ug_group                      | PRIMARY       | 261     | enwiki.revision.rev_user,const |     1 | Using where; Using index; Not exists |
|    1 | PRIMARY            | page        | eq_ref | PRIMARY                               | PRIMARY       | 4       | enwiki.revision.rev_page       |     1 |                                      |
|    2 | DEPENDENT SUBQUERY | change_tag  | ref    | ct_rev_id,change_tag_rev_tag          | ct_rev_id     | 5       | enwiki.revision.rev_id         |     1 | Using where; Using index             |
+------+--------------------+-------------+--------+---------------------------------------+---------------+---------+--------------------------------+-------+--------------------------------------+

The fast query is:

SELECT rev_id,rev_page,rev_text_id,rev_timestamp,rev_comment,rev_user_text,rev_user,rev_minor_edit,rev_deleted,rev_len,rev_parent_id,rev_sha1,rev_content_format,rev_content_model,user_name,page_namespace,page_title,page_is_new,page_latest,page_is_redirect,page_len,(SELECT  GROUP_CONCAT(ct_tag SEPARATOR ',')  FROM `change_tag`   WHERE ct_rev_id=rev_id  ) AS `ts_tags`
FROM `revision`
INNER JOIN `page` ON ((page_id = rev_page))
LEFT JOIN `user` ON ((rev_user != 0) AND (user_id = rev_user))
LEFT JOIN `user_groups` ON ((ug_user = rev_user) AND ug_group = 'bot')
WHERE (rev_user > 28488973) AND (ug_group IS NULL) AND ((rev_deleted & 12) != 12)
ORDER BY rev_timestamp DESC LIMIT 51;
+------+--------------------+-------------+--------+---------------------------------------+---------------+---------+--------------------------------+-------+--------------------------------------+
| id   | select_type        | table       | type   | possible_keys                         | key           | key_len | ref                            | rows  | Extra                                |
+------+--------------------+-------------+--------+---------------------------------------+---------------+---------+--------------------------------+-------+--------------------------------------+
|    1 | PRIMARY            | revision    | index  | PRIMARY,page_timestamp,user_timestamp | rev_timestamp | 16      | NULL                           | 84570 | Using where                          |
|    1 | PRIMARY            | user        | eq_ref | PRIMARY                               | PRIMARY       | 4       | enwiki.revision.rev_user       |     1 | Using where                          |
|    1 | PRIMARY            | user_groups | eq_ref | PRIMARY,ug_group                      | PRIMARY       | 261     | enwiki.revision.rev_user,const |     1 | Using where; Using index; Not exists |
|    1 | PRIMARY            | page        | eq_ref | PRIMARY                               | PRIMARY       | 4       | enwiki.revision.rev_page       |     1 |                                      |
|    2 | DEPENDENT SUBQUERY | change_tag  | ref    | ct_rev_id,change_tag_rev_tag          | ct_rev_id     | 5       | enwiki.revision.rev_id         |     1 | Using where; Using index             |
+------+--------------------+-------------+--------+---------------------------------------+---------------+---------+--------------------------------+-------+--------------------------------------+

(Note that I ran the explains on stat1003, not on production, but the queries were similarly slow/fast there..)

The query plans look about the same… I think the problem is that the further in the past you look, the fewer edits there are by users who are still "newbies" today, and the database has to scan more rows to find the 50 requested results.

@jcrespo Do you see some way to make this better? If not, I guess we should put some limit on how far in the past one can look for newbie contribs.

jcrespo added a comment.EditedJul 18 2016, 1:54 PM

Sorry, I am not very familiar with this functionality "newbies"- correct me if any assumption I make is wrong. What I can infer from the queries is that it checks for users within a range of rev_user, and that that subset of users (compared to the total) is not very high, unlike the other conditions (group, deleted).

As we are already have another "important" range check (timpestamp), and the ordering, that is precisely the best index possible for that query, which is not that good. Due to BTREE indexing limitations, it is not possible to make 2 range scans on 2 different columns with an index. Your suggestion, @matmarex is right that some kind of limiting (such as timestamp or rev_id) would be the preferred solution- in general, this would be the preferred "paging" option (make it based on an id, not the number of results. This will limit the variability in results return, which is probably worse than it being slow all the time.

If this was a very important query to get right, there are some "tricks" to make "double ranges" fast, but they require investment on extra table columns, that probably are not worth in this case (?).

What I can infer from the queries is that it checks for users within a range of rev_user, and that that subset of users (compared to the total) is not very high, unlike the other conditions (group, deleted).

The last 1% of users. From ContribsPager::getUserCond():

			$max = $this->mDb->selectField( 'user', 'max(user_id)', false, __METHOD__ );
			$condition[] = 'rev_user >' . (int)( $max - $max / 100 );
jcrespo edited projects, added DBA, Wikimedia-Rdbms; removed Operations.Jul 18 2016, 4:35 PM

I am setting DBA tag as "Blocked external" to note that this is on my radar, and that I can provide support, reviews and comments on demand, but that I am not working actively on this (this is more of a Wikimedia-Rdbms task).

matmarex claimed this task.Jul 18 2016, 4:41 PM

Let's try cutting off at 30 days in the past, I guess. @Bsadowski1 Do you think that anyone is likely to need to check newbies' contributions older than that? (It'll still be possible to view all contributions of a specific user.)

Change 299573 had a related patch set uploaded (by Bartosz Dziewoński):
ContribsPager: Disallow looking too far in the past for 'newbies' queries

https://gerrit.wikimedia.org/r/299573

matmarex triaged this task as Low priority.Jul 18 2016, 8:04 PM

Change 299573 merged by jenkins-bot:
ContribsPager: Disallow looking too far in the past for 'newbies' queries

https://gerrit.wikimedia.org/r/299573

matmarex closed this task as Resolved.Jul 19 2016, 12:54 AM
matmarex removed a project: Patch-For-Review.

The change will be deployed to Wikimedia wikis this week per the schedule at https://www.mediawiki.org/wiki/MediaWiki_1.28/Roadmap.