Page MenuHomePhabricator

ContribsPager/DeletedContribsPager should use a unique criterion for paging
Open, LowPublic

Description

ContribsPager relies solely on rev_timestamp for paging. This causes revisions to be skipped, when the boundary of two pages falls between revisions with the same timestamp. The solution is to combine rev_timestamp with rev_id for a stable chronological order.

Note that this had already been fixed, but that fix was then lost in an attempt to improve query performance. Queries for user contributions are notoriously sensitive to quirks in the query planner, and they are affected by the actor and comment migration, see T215466.

Caveat: ContribsPager allows extensions to insert revisions via the ContribsPager__reallyDoQuery hook. Flow uses this hook, and appends its own fake "revision" rows to the ones retrieved from core. Having those Flow revisions (UUID revision ids) combined with revisions stored in core (integer revision ids) creates an unreliable foundation for using revision id as a secondary comparator as the 2 ids are unrelated.

However, for T257839 we introduced the revisionsOnly option into ContribsPage, which disables the ContribsPager__reallyDoQuery hook. In this mode, stable paging based on a combination of timestamp and revision ID would again be possible.

This task was originally filed as:
Investigate wrong entry order in the user contributions list
This task is to investigate the wrong order of entries in the contributions list. See parent ticket. Might be related to T46841 .

Related Objects

Event Timeline

Aklapper changed the edit policy from "Custom Policy" to "All Users".Sep 17 2018, 5:47 PM
Aklapper changed Risk Rating from N/A to default.

The reason for the bad order can be seen here: https://phabricator.wikimedia.org/source/mediawiki/browse/master/includes/specials/pagers/ContribsPager.php$358.

Special:Contributions uses the ContribsPager mentioned above. The SQL query ends doing a trivial ORDER BY rev_timestamp (or ORDER BY ipc_rev_timestamp for edits made by IP contributors). Since the timestamps are not guaranteed to be unique, the order of two entries with the exact same timestamp is effectively random. This is what we see in the example https://commons.wikimedia.org/w/index.php?limit=50&title=Special%3AContributions&contribs=user&target=4nn1l2&namespace=6&start=2018-06-26&end=2018-06-26 (the description page of "File:Modarres Expressway, Tehran 20110912.JPG" is edited first, and then imported).

Possible ways forward:

  • ContribsPager is technically badly implemented. The documentation of IndexPager::getIndexField states "it's really not a good idea to use a non-unique index for this! That won't page right." But this is what ContribsPager does. Timestamps aren't unique.
    • IndexPager::getIndexField can not return more than one field. The code mentions an array feature, but this is for different sort orders.
    • The only truly unique field on the revision table is rev_id (and ipc_rev_id on the ip_changes table). However, I don't think it's possible to ORDER BY this ID as it is not really guaranteed to be in increasing order, as far as I know.
  • ContribsPager currently does not implement IndexPager::getExtraSortFields. It might be possible to use this to provide rev_id as a fall-back ORDER BY key.
    • I wondered if the same if-else as in getIndexField is needed, choosing between ipc_rev_id and rev_id. Answer: No. The rev_id field is always available. The ipc_rev_id field is only available when a JOIN with ip_changes is made. This join is performed on 'LEFT JOIN', [ 'ipc_rev_id = rev_id' ], which means the two fields behave entirely identical within this query.

Change 480081 had a related patch set uploaded (by Thiemo Kreuz (WMDE); owner: Thiemo Kreuz (WMDE)):
[mediawiki/core@master] Fix order on Special:Contributions when timestamps are identical

https://gerrit.wikimedia.org/r/480081

thiemowmde triaged this task as Low priority.
thiemowmde moved this task from Sprint Backlog to Review on the WMDE-QWERTY-Sprint-2018-12-04 board.
  • I wondered if the same if-else as in getIndexField is needed, choosing between ipc_rev_id and rev_id. Answer: No. The rev_id field is always available. The ipc_rev_id field is only available when a JOIN with ip_changes is made. This join is performed on 'LEFT JOIN', [ 'ipc_rev_id = rev_id' ], which means the two fields behave entirely identical within this query.

They don't behave identically, although I'm not sure whether the difference actually matters much since it filesorts either way. Ordering by ipc_rev_id rather than rev_id avoids using a temporary and touches fewer rows.

There's no avoiding the filesort, BTW, unless you want to sort the results by IP then timestamp instead of only by timestamp.

ORDER BY rev_id
wikiadmin@10.64.0.92(enwiki)> explain SELECT  rev_id,rev_page,rev_timestamp,rev_minor_edit,rev_deleted,rev_len,rev_parent_id,rev_sha1,comment_rev_comment.comment_text AS `rev_comment_text`,comment_rev_comment.comment_data AS `rev_comment_data`,comment_rev_comment.comment_id AS `rev_comment_cid`,rev_user,rev_user_text,NULL AS `rev_actor`,page_namespace,page_title,page_id,page_latest,page_is_redirect,page_len,user_name,page_is_new,ipc_rev_timestamp,(SELECT  GROUP_CONCAT(ctd_name SEPARATOR ',')  FROM `change_tag` INNER JOIN `change_tag_def` ON ((ct_tag_id=ctd_id))   WHERE ct_rev_id=rev_id  ) AS `ts_tags`  FROM `revision` JOIN `revision_comment_temp` `temp_rev_comment` ON ((temp_rev_comment.revcomment_rev = rev_id)) JOIN `comment` `comment_rev_comment` ON ((comment_rev_comment.comment_id = temp_rev_comment.revcomment_comment_id)) INNER JOIN `page` ON ((page_id = rev_page)) LEFT JOIN `user` ON ((rev_user != 0) AND (user_id = rev_user)) LEFT JOIN `ip_changes` ON ((ipc_rev_id = rev_id))   WHERE (ipc_hex BETWEEN '7F000000' AND '7F0000FF') AND ((rev_deleted & 12) != 12)  ORDER BY ipc_rev_timestamp DESC, rev_id DESC LIMIT 51;
+------+--------------------+---------------------+--------+-----------------------------------------------------------------------+-----------------------+---------+-----------------------------------------------+------+-----------------------------------------------------------+
| id   | select_type        | table               | type   | possible_keys                                                         | key                   | key_len | ref                                           | rows | Extra                                                     |
+------+--------------------+---------------------+--------+-----------------------------------------------------------------------+-----------------------+---------+-----------------------------------------------+------+-----------------------------------------------------------+
|    1 | PRIMARY            | ip_changes          | range  | PRIMARY,ipc_hex_time                                                  | ipc_hex_time          | 37      | NULL                                          |  144 | Using where; Using index; Using temporary; Using filesort |
|    1 | PRIMARY            | revision            | eq_ref | PRIMARY,page_timestamp,page_user_timestamp,rev_page_id                | PRIMARY               | 4       | enwiki.ip_changes.ipc_rev_id                  |    1 | Using where                                               |
|    1 | PRIMARY            | page                | eq_ref | PRIMARY                                                               | PRIMARY               | 4       | enwiki.revision.rev_page                      |    1 |                                                           |
|    1 | PRIMARY            | user                | eq_ref | PRIMARY                                                               | PRIMARY               | 4       | enwiki.revision.rev_user                      |    1 | Using where                                               |
|    1 | PRIMARY            | temp_rev_comment    | ref    | PRIMARY,revcomment_rev                                                | PRIMARY               | 4       | enwiki.ip_changes.ipc_rev_id                  |    1 | Using index                                               |
|    1 | PRIMARY            | comment_rev_comment | eq_ref | PRIMARY                                                               | PRIMARY               | 8       | enwiki.temp_rev_comment.revcomment_comment_id |    1 |                                                           |
|    2 | DEPENDENT SUBQUERY | change_tag          | ref    | change_tag_rev_tag_id,change_tag_tag_id_id,change_tag_rev_tag_nonuniq | change_tag_rev_tag_id | 5       | enwiki.revision.rev_id                        |    1 | Using where; Using index                                  |
|    2 | DEPENDENT SUBQUERY | change_tag_def      | eq_ref | PRIMARY                                                               | PRIMARY               | 4       | enwiki.change_tag.ct_tag_id                   |    1 |                                                           |
+------+--------------------+---------------------+--------+-----------------------------------------------------------------------+-----------------------+---------+-----------------------------------------------+------+-----------------------------------------------------------+
8 rows in set (0.00 sec)

wikiadmin@10.64.0.92(enwiki)> pager > /dev/null; SELECT  rev_id,rev_page,rev_timestamp,rev_minor_edit,rev_deleted,rev_len,rev_parent_id,rev_sha1,comment_rev_comment.comment_text AS `rev_comment_text`,comment_rev_comment.comment_data AS `rev_comment_data`,comment_rev_comment.comment_id AS `rev_comment_cid`,rev_user,rev_user_text,NULL AS `rev_actor`,page_namespace,page_title,page_id,page_latest,page_is_redirect,page_len,user_name,page_is_new,ipc_rev_timestamp,(SELECT  GROUP_CONCAT(ctd_name SEPARATOR ',')  FROM `change_tag` INNER JOIN `change_tag_def` ON ((ct_tag_id=ctd_id))   WHERE ct_rev_id=rev_id  ) AS `ts_tags`  FROM `revision` JOIN `revision_comment_temp` `temp_rev_comment` ON ((temp_rev_comment.revcomment_rev = rev_id)) JOIN `comment` `comment_rev_comment` ON ((comment_rev_comment.comment_id = temp_rev_comment.revcomment_comment_id)) INNER JOIN `page` ON ((page_id = rev_page)) LEFT JOIN `user` ON ((rev_user != 0) AND (user_id = rev_user)) LEFT JOIN `ip_changes` ON ((ipc_rev_id = rev_id))   WHERE (ipc_hex BETWEEN '7F000000' AND '7F0000FF') AND ((rev_deleted & 12) != 12)  ORDER BY ipc_rev_timestamp DESC, rev_id DESC LIMIT 51; nopager; SHOW STATUS LIKE 'Hand%';
PAGER set to '> /dev/null'
51 rows in set (0.01 sec)

PAGER set to stdout
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Handler_commit             | 2     |
| Handler_delete             | 0     |
| Handler_discover           | 0     |
| Handler_external_lock      | 0     |
| Handler_icp_attempts       | 0     |
| Handler_icp_match          | 0     |
| Handler_mrr_init           | 0     |
| Handler_mrr_key_refills    | 0     |
| Handler_mrr_rowid_refills  | 0     |
| Handler_prepare            | 0     |
| Handler_read_first         | 0     |
| Handler_read_key           | 714   |
| Handler_read_last          | 0     |
| Handler_read_next          | 322   |
| Handler_read_prev          | 0     |
| Handler_read_retry         | 0     |
| Handler_read_rnd           | 51    |
| Handler_read_rnd_deleted   | 0     |
| Handler_read_rnd_next      | 144   |
| Handler_rollback           | 0     |
| Handler_savepoint          | 0     |
| Handler_savepoint_rollback | 0     |
| Handler_tmp_update         | 0     |
| Handler_tmp_write          | 143   |
| Handler_update             | 0     |
| Handler_write              | 0     |
+----------------------------+-------+
26 rows in set (0.16 sec)
ORDER BY ipc_rev_id
wikiadmin@10.64.0.92(enwiki)> explain SELECT  rev_id,rev_page,rev_timestamp,rev_minor_edit,rev_deleted,rev_len,rev_parent_id,rev_sha1,comment_rev_comment.comment_text AS `rev_comment_text`,comment_rev_comment.comment_data AS `rev_comment_data`,comment_rev_comment.comment_id AS `rev_comment_cid`,rev_user,rev_user_text,NULL AS `rev_actor`,page_namespace,page_title,page_id,page_latest,page_is_redirect,page_len,user_name,page_is_new,ipc_rev_timestamp,(SELECT  GROUP_CONCAT(ctd_name SEPARATOR ',')  FROM `change_tag` INNER JOIN `change_tag_def` ON ((ct_tag_id=ctd_id))   WHERE ct_rev_id=rev_id  ) AS `ts_tags`  FROM `revision` JOIN `revision_comment_temp` `temp_rev_comment` ON ((temp_rev_comment.revcomment_rev = rev_id)) JOIN `comment` `comment_rev_comment` ON ((comment_rev_comment.comment_id = temp_rev_comment.revcomment_comment_id)) INNER JOIN `page` ON ((page_id = rev_page)) LEFT JOIN `user` ON ((rev_user != 0) AND (user_id = rev_user)) LEFT JOIN `ip_changes` ON ((ipc_rev_id = rev_id))   WHERE (ipc_hex BETWEEN '7F000000' AND '7F0000FF') AND ((rev_deleted & 12) != 12)  ORDER BY ipc_rev_timestamp DESC, ipc_rev_id DESC LIMIT 51;
+------+--------------------+---------------------+--------+-----------------------------------------------------------------------+-----------------------+---------+-----------------------------------------------+------+------------------------------------------+
| id   | select_type        | table               | type   | possible_keys                                                         | key                   | key_len | ref                                           | rows | Extra                                    |
+------+--------------------+---------------------+--------+-----------------------------------------------------------------------+-----------------------+---------+-----------------------------------------------+------+------------------------------------------+
|    1 | PRIMARY            | ip_changes          | range  | PRIMARY,ipc_hex_time                                                  | ipc_hex_time          | 37      | NULL                                          |  144 | Using where; Using index; Using filesort |
|    1 | PRIMARY            | revision            | eq_ref | PRIMARY,page_timestamp,page_user_timestamp,rev_page_id                | PRIMARY               | 4       | enwiki.ip_changes.ipc_rev_id                  |    1 | Using where                              |
|    1 | PRIMARY            | page                | eq_ref | PRIMARY                                                               | PRIMARY               | 4       | enwiki.revision.rev_page                      |    1 |                                          |
|    1 | PRIMARY            | user                | eq_ref | PRIMARY                                                               | PRIMARY               | 4       | enwiki.revision.rev_user                      |    1 | Using where                              |
|    1 | PRIMARY            | temp_rev_comment    | ref    | PRIMARY,revcomment_rev                                                | PRIMARY               | 4       | enwiki.ip_changes.ipc_rev_id                  |    1 | Using index                              |
|    1 | PRIMARY            | comment_rev_comment | eq_ref | PRIMARY                                                               | PRIMARY               | 8       | enwiki.temp_rev_comment.revcomment_comment_id |    1 |                                          |
|    2 | DEPENDENT SUBQUERY | change_tag          | ref    | change_tag_rev_tag_id,change_tag_tag_id_id,change_tag_rev_tag_nonuniq | change_tag_rev_tag_id | 5       | enwiki.revision.rev_id                        |    1 | Using where; Using index                 |
|    2 | DEPENDENT SUBQUERY | change_tag_def      | eq_ref | PRIMARY                                                               | PRIMARY               | 4       | enwiki.change_tag.ct_tag_id                   |    1 |                                          |
+------+--------------------+---------------------+--------+-----------------------------------------------------------------------+-----------------------+---------+-----------------------------------------------+------+------------------------------------------+
8 rows in set (0.00 sec)

wikiadmin@10.64.0.92(enwiki)> pager > /dev/null; SELECT  rev_id,rev_page,rev_timestamp,rev_minor_edit,rev_deleted,rev_len,rev_parent_id,rev_sha1,comment_rev_comment.comment_text AS `rev_comment_text`,comment_rev_comment.comment_data AS `rev_comment_data`,comment_rev_comment.comment_id AS `rev_comment_cid`,rev_user,rev_user_text,NULL AS `rev_actor`,page_namespace,page_title,page_id,page_latest,page_is_redirect,page_len,user_name,page_is_new,ipc_rev_timestamp,(SELECT  GROUP_CONCAT(ctd_name SEPARATOR ',')  FROM `change_tag` INNER JOIN `change_tag_def` ON ((ct_tag_id=ctd_id))   WHERE ct_rev_id=rev_id  ) AS `ts_tags`  FROM `revision` JOIN `revision_comment_temp` `temp_rev_comment` ON ((temp_rev_comment.revcomment_rev = rev_id)) JOIN `comment` `comment_rev_comment` ON ((comment_rev_comment.comment_id = temp_rev_comment.revcomment_comment_id)) INNER JOIN `page` ON ((page_id = rev_page)) LEFT JOIN `user` ON ((rev_user != 0) AND (user_id = rev_user)) LEFT JOIN `ip_changes` ON ((ipc_rev_id = rev_id))   WHERE (ipc_hex BETWEEN '7F000000' AND '7F0000FF') AND ((rev_deleted & 12) != 12)  ORDER BY ipc_rev_timestamp DESC, ipc_rev_id DESC LIMIT 51; nopager; SHOW STATUS LIKE 'Hand%';
PAGER set to '> /dev/null'
51 rows in set (0.00 sec)

PAGER set to stdout
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Handler_commit             | 2     |
| Handler_delete             | 0     |
| Handler_discover           | 0     |
| Handler_external_lock      | 0     |
| Handler_icp_attempts       | 0     |
| Handler_icp_match          | 0     |
| Handler_mrr_init           | 0     |
| Handler_mrr_key_refills    | 0     |
| Handler_mrr_rowid_refills  | 0     |
| Handler_prepare            | 0     |
| Handler_read_first         | 0     |
| Handler_read_key           | 278   |
| Handler_read_last          | 0     |
| Handler_read_next          | 222   |
| Handler_read_prev          | 0     |
| Handler_read_retry         | 0     |
| Handler_read_rnd           | 0     |
| Handler_read_rnd_deleted   | 0     |
| Handler_read_rnd_next      | 0     |
| Handler_rollback           | 0     |
| Handler_savepoint          | 0     |
| Handler_savepoint_rollback | 0     |
| Handler_tmp_update         | 0     |
| Handler_tmp_write          | 0     |
| Handler_update             | 0     |
| Handler_write              | 0     |
+----------------------------+-------+
26 rows in set (0.15 sec)

I'm not very good in reading these EXPLAIN outputs. Can you help us understand what the consequence of what we see in there might be?

What I meant with "behaving identical" is that there is no way the results can differ, because the two fields are by definition identical.

My assumption was that the bulk of what the ORDER BY needs to do will still be done based on the rev_timestamp (or ipc_rev_timestamp) column. Both have an index! Only rows with identical timestamps need to be ordered by the ID column. I expected the query optimizer to figure this out. If not, is there a way to give it a hint to do exactly that?

I noticed both of the queries listed above contain a JOIN with ip_changes. I wonder how the query behaves without this JOIN? (It must use rev_id then.) Are you able to provide that third EXPLAIN output?

I'm not very good in reading these EXPLAIN outputs. Can you help us understand what the consequence of what we see in there might be?

In this case, the only difference in the EXPLAIN is the "using temporary" annotation. This means that MySQL MariaDB has to collect the result rows (i.e. all the selected columns) into a temporary table in order to sort them. When ordering by ipc_rev_id instead, it can manage to do the sorting without having to collect the whole row first.

You can get more details about the EXPLAIN output at https://dev.mysql.com/doc/refman/8.0/en/explain-output.html.

I also posted the output of SHOW STATUS LIKE 'Hand%' after executing both versions of the query, which gives us some insight as to how the specific query was actually processed. The numbers are complicated a bit by the joins of all the other tables, but we can see that the rev_id version did write 143 rows to the temporary table (Handler_tmp_write) and we also see higher numbers for some of the other properties as well. Since there are only 144 rows total that match this query, that's a little concerning and we might want to try it again with ranges that have more edits to see how it behaves.

For 85.0.0.0/16, which has 5520 edits on enwiki, the relevant bits are:

Variable_namerev_idipc_rev_id
Handler_read_key24363223
Handler_read_next112645570
Handler_read_rnd510
Handler_read_rnd_next55210
Handler_tmp_write55200

It looks like ordering by ipc_rev_id is able to go through just the ip_changes.ipc_hex_time index to identify the rows it will need to return, and then pretty efficiently pull out just the rows it needs from the other tables. While ordering by rev_id it's pulling the data from all the tables for all 5520 rows to put into that temporary table. With warm caches, the ipc_rev_id ordering returned in 0.02 seconds while the rev_id ordering took 0.30 seconds.

BTW, you don't necessarily have to change the ORDER BY clause to fix this. ORDER BY uses result field aliases from the SELECT in preference to fields from the underlying tables, so if you change the query to SELECT ipc_rev_id AS rev_id, ... ORDER BY ipc_rev_timestamp DESC, rev_id DESC LIMIT 51 it'll actually do the more efficient plan ordering by ipc_rev_id.

What I meant with "behaving identical" is that there is no way the results can differ, because the two fields are by definition identical.

Yes, they give identical results. The performance behavior differs though.

My assumption was that the bulk of what the ORDER BY needs to do will still be done based on the rev_timestamp (or ipc_rev_timestamp) column. Both have an index! Only rows with identical timestamps need to be ordered by the ID column. I expected the query optimizer to figure this out. If not, is there a way to give it a hint to do exactly that?

The index on (rev_timestamp) doesn't generally matter here, as to use that it would have to start going through all the edits to find the few made by the user of interest. It's much faster to use the index on (rev_user,rev_timestamp) or (rev_user_text,rev_timestamp) (or, soon, (rev_actor,rev_timestamp)) to look through only the edits by the user of interest, in order by timestamp.

In the ip_changes case the index on (ipc_rev_timestamp) similarly doesn't matter. It's still faster to use the (ipc_hex,ipc_rev_timestamp) index, even though it has to pull all the rows for all the IPs in the range and re-sort them by timestamp to find the ones.

BTW, with InnoDB tables the table's primary key is implicitly appended to every other index, so that index on ip_changes is really (ipc_hex,ipc_rev_timestamp,ipc_rev_id) and it can take advantage of that if your ORDER BY matches. But for good and bad reasons it doesn't match when you're ordering by rev_id rather than ipc_rev_id, resulting in the different query behavior.

I noticed both of the queries listed above contain a JOIN with ip_changes. I wonder how the query behaves without this JOIN? (It must use rev_id then.) Are you able to provide that third EXPLAIN output?

If you don't search on an IP range, then it doesn't use ip_changes at all and things look like this:

wikiadmin@10.64.0.92(enwiki)> explain SELECT  rev_id,rev_page,rev_timestamp,rev_minor_edit,rev_deleted,rev_len,rev_parent_id,rev_sha1,comment_rev_comment.comment_text AS `rev_comment_text`,comment_rev_comment.comment_data AS `rev_comment_data`,comment_rev_comment.comment_id AS `rev_comment_cid`,rev_user,rev_user_text,NULL AS `rev_actor`,page_namespace,page_title,page_id,page_latest,page_is_redirect,page_len,user_name,page_is_new,(SELECT  GROUP_CONCAT(ctd_name SEPARATOR ',')  FROM `change_tag` INNER JOIN `change_tag_def` ON ((ct_tag_id=ctd_id))   WHERE ct_rev_id=rev_id  ) AS `ts_tags`  FROM `revision` FORCE INDEX (usertext_timestamp) JOIN `revision_comment_temp` `temp_rev_comment` ON ((temp_rev_comment.revcomment_rev = rev_id)) JOIN `comment` `comment_rev_comment` ON ((comment_rev_comment.comment_id = temp_rev_comment.revcomment_comment_id)) INNER JOIN `page` ON ((page_id = rev_page)) LEFT JOIN `user` ON ((rev_user != 0) AND (user_id = rev_user))   WHERE ((rev_user_text = '127.0.0.1')) AND ((rev_deleted & 12) != 12)  ORDER BY rev_timestamp DESC, rev_id DESC LIMIT 51;
+------+--------------------+---------------------+--------+-----------------------------------------------------------------------+-----------------------+---------+-----------------------------------------------+------+--------------------------+
| id   | select_type        | table               | type   | possible_keys                                                         | key                   | key_len | ref                                           | rows | Extra                    |
+------+--------------------+---------------------+--------+-----------------------------------------------------------------------+-----------------------+---------+-----------------------------------------------+------+--------------------------+
|    1 | PRIMARY            | revision            | ref    | usertext_timestamp                                                    | usertext_timestamp    | 257     | const                                         |  144 | Using where              |
|    1 | PRIMARY            | user                | eq_ref | PRIMARY                                                               | PRIMARY               | 4       | enwiki.revision.rev_user                      |    1 | Using where              |
|    1 | PRIMARY            | page                | eq_ref | PRIMARY                                                               | PRIMARY               | 4       | enwiki.revision.rev_page                      |    1 |                          |
|    1 | PRIMARY            | temp_rev_comment    | ref    | PRIMARY,revcomment_rev                                                | PRIMARY               | 4       | enwiki.revision.rev_id                        |    1 | Using index              |
|    1 | PRIMARY            | comment_rev_comment | eq_ref | PRIMARY                                                               | PRIMARY               | 8       | enwiki.temp_rev_comment.revcomment_comment_id |    1 |                          |
|    2 | DEPENDENT SUBQUERY | change_tag          | ref    | change_tag_rev_tag_id,change_tag_tag_id_id,change_tag_rev_tag_nonuniq | change_tag_rev_tag_id | 5       | enwiki.revision.rev_id                        |    1 | Using where; Using index |
|    2 | DEPENDENT SUBQUERY | change_tag_def      | eq_ref | PRIMARY                                                               | PRIMARY               | 4       | enwiki.change_tag.ct_tag_id                   |    1 |                          |
+------+--------------------+---------------------+--------+-----------------------------------------------------------------------+-----------------------+---------+-----------------------------------------------+------+--------------------------+
7 rows in set (0.00 sec)

wikiadmin@10.64.0.92(enwiki)> pager > /dev/null; SELECT  rev_id,rev_page,rev_timestamp,rev_minor_edit,rev_deleted,rev_len,rev_parent_id,rev_sha1,comment_rev_comment.comment_text AS `rev_comment_text`,comment_rev_comment.comment_data AS `rev_comment_data`,comment_rev_comment.comment_id AS `rev_comment_cid`,rev_user,rev_user_text,NULL AS `rev_actor`,page_namespace,page_title,page_id,page_latest,page_is_redirect,page_len,user_name,page_is_new,(SELECT  GROUP_CONCAT(ctd_name SEPARATOR ',')  FROM `change_tag` INNER JOIN `change_tag_def` ON ((ct_tag_id=ctd_id))   WHERE ct_rev_id=rev_id  ) AS `ts_tags`  FROM `revision` FORCE INDEX (usertext_timestamp) JOIN `revision_comment_temp` `temp_rev_comment` ON ((temp_rev_comment.revcomment_rev = rev_id)) JOIN `comment` `comment_rev_comment` ON ((comment_rev_comment.comment_id = temp_rev_comment.revcomment_comment_id)) INNER JOIN `page` ON ((page_id = rev_page)) LEFT JOIN `user` ON ((rev_user != 0) AND (user_id = rev_user))   WHERE ((rev_user_text = '127.0.0.1')) AND ((rev_deleted & 12) != 12)  ORDER BY rev_timestamp DESC, rev_id DESC LIMIT 51; nopager; SHOW STATUS LIKE 'Hand%';
PAGER set to '> /dev/null'
51 rows in set (0.00 sec)

PAGER set to stdout
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Handler_commit             | 2     |
| Handler_delete             | 0     |
| Handler_discover           | 0     |
| Handler_external_lock      | 0     |
| Handler_icp_attempts       | 0     |
| Handler_icp_match          | 0     |
| Handler_mrr_init           | 0     |
| Handler_mrr_key_refills    | 0     |
| Handler_mrr_rowid_refills  | 0     |
| Handler_prepare            | 0     |
| Handler_read_first         | 0     |
| Handler_read_key           | 227   |
| Handler_read_last          | 0     |
| Handler_read_next          | 78    |
| Handler_read_prev          | 50    |
| Handler_read_retry         | 0     |
| Handler_read_rnd           | 0     |
| Handler_read_rnd_deleted   | 0     |
| Handler_read_rnd_next      | 0     |
| Handler_rollback           | 0     |
| Handler_savepoint          | 0     |
| Handler_savepoint_rollback | 0     |
| Handler_tmp_update         | 0     |
| Handler_tmp_write          | 0     |
| Handler_update             | 0     |
| Handler_write              | 0     |
+----------------------------+-------+
26 rows in set (0.15 sec)

There's nothing much interesting to comment on there, beyond that it doesn't even have to filesort since the WHERE constants and ORDER BY exactly match the index on revision that it uses. Also direct comparison between this and the earlier queries is easy to make because all 144 edits in 127.0.0.0/24 do in fact belong to 127.0.0.1.

Change 480762 had a related patch set uploaded (by Thiemo Kreuz (WMDE); owner: Thiemo Kreuz (WMDE)):
[mediawiki/extensions/FileImporter@master] Fix unpredictable history order by enforcing >=1 second difference

https://gerrit.wikimedia.org/r/480762

Change 480762 merged by jenkins-bot:
[mediawiki/extensions/FileImporter@master] Fix unpredictable history order by enforcing >=1 second difference

https://gerrit.wikimedia.org/r/480762

Thanks a lot for the incredibly helpful insight!

[…] if you change the query to SELECT ipc_rev_id AS rev_id, ... ORDER BY ipc_rev_timestamp DESC, rev_id DESC LIMIT 51 it'll actually do the more efficient plan ordering by ipc_rev_id.

I changed https://gerrit.wikimedia.org/r/480081 to do exactly that. What do you think?

You're welcome! I think the patch looks good now, we can finish it up in code review.

Change 480081 merged by jenkins-bot:
[mediawiki/core@master] Fix order on Special:Contributions when timestamps are identical

https://gerrit.wikimedia.org/r/480081

@Lea_WMDE, this is hard to test because the bad order was essentially a random result.

daniel subscribed.

This has been broken again by the fix to T221380: Wikidata: Special:Contributions times out for many high-activity users. Before fixing it again, we have to make sure that the solution doesn't degrade database performance. In particular, we need to determine whether this is blocked on T215466: Remove revision_comment_temp and revision_actor_temp.

daniel renamed this task from Investigate wrong entry order in the user contributions list to ContribsPager should use a unique criterion for paging.May 29 2020, 9:05 AM
daniel updated the task description. (Show Details)

With the current schema and code, the query used is:

SELECT  rev_id,rev_page,rev_timestamp,rev_minor_edit,rev_deleted,rev_len,rev_parent_id,rev_sha1,
  comment_rev_comment.comment_text AS `rev_comment_text`,comment_rev_comment.comment_data AS `rev_comment_data`,comment_rev_comment.comment_id AS `rev_comment_cid`,
  actor_rev_user.actor_user AS `rev_user`,actor_rev_user.actor_name AS `rev_user_text`,temp_rev_user.revactor_actor AS `rev_actor`,
  page_namespace,page_title,page_id,page_latest,page_is_redirect,page_len,
  user_name,page_is_new,revactor_timestamp,
  ( SELECT  GROUP_CONCAT(ctd_name SEPARATOR ',')
    FROM `change_tag`
    JOIN `change_tag_def` ON ((ct_tag_id=ctd_id))
    WHERE ct_rev_id=rev_id  ) AS `ts_tags`
FROM `revision`
JOIN `revision_comment_temp` `temp_rev_comment` ON ((temp_rev_comment.revcomment_rev = rev_id))
JOIN `comment` `comment_rev_comment` ON ((comment_rev_comment.comment_id = temp_rev_comment.revcomment_comment_id))
JOIN `revision_actor_temp` `temp_rev_user` FORCE INDEX (actor_timestamp) ON ((temp_rev_user.revactor_rev = rev_id))
JOIN `actor` `actor_rev_user` ON ((actor_rev_user.actor_id = temp_rev_user.revactor_actor))
JOIN `page` ON ((page_id = rev_page))
LEFT JOIN `user` ON ((actor_rev_user.actor_user != 0) AND (user_id = actor_rev_user.actor_user))   
WHERE ((temp_rev_user.revactor_actor = 2))
  AND ((rev_deleted & 4) = 0)
  AND (((revactor_timestamp<'20200529092122')))
ORDER BY revactor_timestamp DESC,revactor_rev DESC
LIMIT 3

The query plan for that, on my local, nearly empty wiki, looks like this:

+------+--------------------+---------------------+--------+-------------------------------------------------------------+-----------------------+---------+------------------------------------------------+------+--------------------------+
| id   | select_type        | table               | type   | possible_keys                                               | key                   | key_len | ref                                            | rows | Extra                    |
+------+--------------------+---------------------+--------+-------------------------------------------------------------+-----------------------+---------+------------------------------------------------+------+--------------------------+
|    1 | PRIMARY            | actor_rev_user      | const  | PRIMARY                                                     | PRIMARY               | 8       | const                                          | 1    |                          |
|    1 | PRIMARY            | user                | const  | PRIMARY                                                     | PRIMARY               | 4       | const                                          | 1    |                          |
|    1 | PRIMARY            | temp_rev_user       | range  | actor_timestamp                                             | actor_timestamp       | 22      | NULL                                           | 1    | Using where; Using index |
|    1 | PRIMARY            | revision            | eq_ref | PRIMARY,rev_page_id,page_timestamp,rev_page_actor_timestamp | PRIMARY               | 4       | default.temp_rev_user.revactor_rev             | 1    | Using where              |
|    1 | PRIMARY            | page                | eq_ref | PRIMARY                                                     | PRIMARY               | 4       | default.revision.rev_page                      | 1    |                          |
|    1 | PRIMARY            | temp_rev_comment    | ref    | PRIMARY,revcomment_rev                                      | PRIMARY               | 4       | default.temp_rev_user.revactor_rev             | 1    | Using index              |
|    1 | PRIMARY            | comment_rev_comment | eq_ref | PRIMARY                                                     | PRIMARY               | 8       | default.temp_rev_comment.revcomment_comment_id | 1    |                          |
|    2 | DEPENDENT SUBQUERY | change_tag          | ref    | change_tag_rev_tag_id,change_tag_tag_id_id                  | change_tag_rev_tag_id | 5       | default.revision.rev_id                        | 1    | Using index              |
|    2 | DEPENDENT SUBQUERY | change_tag_def      | eq_ref | PRIMARY                                                     | PRIMARY               | 4       | default.change_tag.ct_tag_id                   | 1    |                          |
+------+--------------------+---------------------+--------+-------------------------------------------------------------+-----------------------+---------+------------------------------------------------+------+--------------------------+

With the code changed to use rev_timestamp + rev_id for paging, this becomes:

SELECT  rev_id,rev_page,rev_timestamp,rev_minor_edit,rev_deleted,rev_len,rev_parent_id,rev_sha1,
  comment_rev_comment.comment_text AS `rev_comment_text`,comment_rev_comment.comment_data AS `rev_comment_data`,comment_rev_comment.comment_id AS `rev_comment_cid`,
  actor_rev_user.actor_user AS `rev_user`,actor_rev_user.actor_name AS `rev_user_text`,temp_rev_user.revactor_actor AS `rev_actor`,
  page_namespace,page_title,page_id,page_latest,page_is_redirect,page_len,
  user_name,page_is_new,revactor_timestamp,revactor_rev,
  ( SELECT  GROUP_CONCAT(ctd_name SEPARATOR ',')
    FROM `change_tag`
    JOIN `change_tag_def`
    ON ((ct_tag_id=ctd_id))
    WHERE ct_rev_id=rev_id  ) AS `ts_tags`
FROM `revision`
JOIN `revision_comment_temp` `temp_rev_comment` ON ((temp_rev_comment.revcomment_rev = rev_id))
JOIN `comment` `comment_rev_comment` ON ((comment_rev_comment.comment_id = temp_rev_comment.revcomment_comment_id))
JOIN `revision_actor_temp` `temp_rev_user` FORCE INDEX (actor_timestamp) ON ((temp_rev_user.revactor_rev = rev_id))
JOIN `actor` `actor_rev_user` ON ((actor_rev_user.actor_id = temp_rev_user.revactor_actor))
JOIN `page` ON ((page_id = rev_page))
LEFT JOIN `user` ON ((actor_rev_user.actor_user != 0) AND (user_id = actor_rev_user.actor_user))   
WHERE ((temp_rev_user.revactor_actor = 2))
  AND ((rev_deleted & 4) = 0)
  AND (((revactor_timestamp<'20200529091519'))
    OR ((revactor_timestamp='20200529091519') AND (revactor_rev<'4')))
ORDER BY revactor_timestamp DESC,revactor_rev DESC,revactor_rev
DESC LIMIT 3

with this query plan:

+------+--------------------+---------------------+--------+-------------------------------------------------------------+-----------------------+---------+------------------------------------------------+------+--------------------------+
| id   | select_type        | table               | type   | possible_keys                                               | key                   | key_len | ref                                            | rows | Extra                    |
+------+--------------------+---------------------+--------+-------------------------------------------------------------+-----------------------+---------+------------------------------------------------+------+--------------------------+
|    1 | PRIMARY            | actor_rev_user      | const  | PRIMARY                                                     | PRIMARY               | 8       | const                                          | 1    |                          |
|    1 | PRIMARY            | user                | const  | PRIMARY                                                     | PRIMARY               | 4       | const                                          | 1    |                          |
|    1 | PRIMARY            | temp_rev_user       | range  | actor_timestamp                                             | actor_timestamp       | 26      | NULL                                           | 2    | Using where; Using index |
|    1 | PRIMARY            | revision            | eq_ref | PRIMARY,rev_page_id,page_timestamp,rev_page_actor_timestamp | PRIMARY               | 4       | default.temp_rev_user.revactor_rev             | 1    | Using where              |
|    1 | PRIMARY            | page                | eq_ref | PRIMARY                                                     | PRIMARY               | 4       | default.revision.rev_page                      | 1    |                          |
|    1 | PRIMARY            | temp_rev_comment    | ref    | PRIMARY,revcomment_rev                                      | PRIMARY               | 4       | default.temp_rev_user.revactor_rev             | 1    | Using index              |
|    1 | PRIMARY            | comment_rev_comment | eq_ref | PRIMARY                                                     | PRIMARY               | 8       | default.temp_rev_comment.revcomment_comment_id | 1    |                          |
|    2 | DEPENDENT SUBQUERY | change_tag          | ref    | change_tag_rev_tag_id,change_tag_tag_id_id                  | change_tag_rev_tag_id | 5       | default.revision.rev_id                        | 1    | Using index              |
|    2 | DEPENDENT SUBQUERY | change_tag_def      | eq_ref | PRIMARY                                                     | PRIMARY               | 4       | default.change_tag.ct_tag_id                   | 1    |                          |
+------+--------------------+---------------------+--------+-------------------------------------------------------------+-----------------------+---------+------------------------------------------------+------+--------------------------+

Change 601713 had a related patch set uploaded (by Daniel Kinzler; owner: Daniel Kinzler):
[mediawiki/core@master] make rev_actor_timestamp index unique.

https://gerrit.wikimedia.org/r/601713

Change 598419 had a related patch set uploaded (by Daniel Kinzler; owner: Daniel Kinzler):
[mediawiki/core@master] ContribsPages: use stable pagination

https://gerrit.wikimedia.org/r/598419

Change 601713 merged by jenkins-bot:
[mediawiki/core@master] make rev_actor_timestamp index cover the rev_id field.

https://gerrit.wikimedia.org/r/601713

thiemowmde removed the point value for this task.

Use the example from T114756 to test continuation, which is still broken.

The problem is that getExtraSortFields is only usable for sorting, not for continuation. The rev_id is not part of the offset link and not used when requesting the next page. It must use getIndexField with two fields for this purpose, see https://gerrit.wikimedia.org/r/c/mediawiki/core/+/787832 for some examples.
Unfortunally getExtraSortFields is now also used on the deleted contributions and that would fail the same way, but would work after the same fix (with different columns). The linked https://gerrit.wikimedia.org/r/c/mediawiki/core/+/598419 is doing something in that way, maybe it can fix the deleted contribs as well, because both pager are working very similiar due to the hook (and that why using getExtraSortFields as a workaround for the moment)

Umherirrender renamed this task from ContribsPager should use a unique criterion for paging to ContribsPager/DeletedContribsPager should use a unique criterion for paging.Oct 18 2022, 8:26 PM