Page MenuHomePhabricator

list=allrevisions does not have an arvtag parameter
Open, LowPublicFeature

Description

The list=alldeletedrevisions has a adrtag parameter to show only revisions marked with given tag. The list=allrevisions should provide an arvtag parameter for the same purpose.

Event Timeline

Unicornisaurous subscribed.

If I'm not mistaken, this would be a reasonable GCI task. I would be willing to mentor.

Change 394757 had a related patch set uploaded (by Pppery; owner: Pppery):
[mediawiki/core@master] Add arvtag parameter to list=allrevisions API queries.

https://gerrit.wikimedia.org/r/394757

@TTO When asked on ticket about peformance profiling, I will normally anwer on phabricator as the editor there is quite bad. These are my findings for it:

SELECT /*! STRAIGHT_JOIN */ rev_id,rev_page,rev_text_id,rev_timestamp,rev_user_text,rev_user,rev_minor_edit,rev_deleted,rev_len,rev_parent_id,rev_sha1,rev_comment AS rev_comment_text,NULL AS rev_comment_data,NULL AS rev_comment_cid,rev_content_format,rev_content_model,page_namespace,page_title,page_id,page_latest,page_is_redirect,page_len FROM revision INNER JOIN page ON ((page_id = rev_page)) INNER JOIN change_tag ON ((rev_id=ct_rev_id)) WHERE ct_tag = 'Foo' ORDER BY rev_timestamp DESC,rev_id DESC LIMIT 11

The explain may look fine:

enwiki> EXPLAIN SELECT /*! STRAIGHT_JOIN */ rev_id,rev_page,rev_text_id,rev_timestamp,rev_user_text,rev_user,rev_minor_edit,rev_deleted,rev_len,rev_parent_id,rev_sha1,rev_comment AS rev_comment_text,NULL AS rev_comment_data,NULL AS rev_comment_cid,rev_content_format,rev_content_model,page_namespace,page_title,page_id,page_latest,page_is_redirect,page_len FROM revision INNER JOIN page ON ((page_id = rev_page)) INNER JOIN change_tag ON ((rev_id=ct_rev_id)) WHERE ct_tag = 'Foo' ORDER BY rev_timestamp DESC,rev_id DESC LIMIT 11\G           
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: revision
         type: index
possible_keys: PRIMARY,page_timestamp,rev_page_id
          key: rev_timestamp
      key_len: 16
          ref: NULL
         rows: 11
        Extra: 
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: page
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: enwiki.revision.rev_page
         rows: 1
        Extra: 
*************************** 3. row ***************************
           id: 1
  select_type: SIMPLE
        table: change_tag
         type: ref
possible_keys: ct_rev_id,change_tag_rev_tag,ct_tag,change_tag_tag_id
          key: ct_rev_id
      key_len: 262
          ref: enwiki.revision.rev_id,const
         rows: 1
        Extra: Using where; Using index
3 rows in set (0.15 sec)

There is no type:ALL (full table scan), but there is an index scan on revision, which is unlikely to finish within the 60 seconds of the query (and it should finish in 1 second or less):

When run:

root@dbstore1002[enwiki]> FLUSH STATUS; SELECT /*! STRAIGHT_JOIN */ rev_id,rev_page,rev_text_id,rev_timestamp,rev_user_text,rev_user,rev_minor_edit,rev_deleted,rev_len,rev_parent_id,rev_sha1,rev_comment AS rev_comment_text,NULL AS rev_comment_data,NULL AS rev_comment_cid,rev_content_format,rev_content_model,page_namespace,page_title,page_id,page_latest,page_is_redirect,page_len FROM revision INNER JOIN page ON ((page_id = rev_page)) INNER JOIN change_tag ON ((rev_id=ct_rev_id)) WHERE ct_tag = 'Foo' ORDER BY rev_timestamp DESC,rev_id DESC LIMIT 11; SHOW STATUS like 'Hand%';
Query OK, 0 rows affected (0.05 sec)

^CCtrl-C -- query killed. Continuing normally.
ERROR 1317 (70100): Query execution was interrupted
+----------------------------+--------+
| Variable_name              | Value  |
+----------------------------+--------+
| Handler_commit             | 0      |
| Handler_delete             | 0      |
| Handler_discover           | 0      |
| Handler_external_lock      | 0      |
| Handler_icp_attempts       | 0      |
| Handler_icp_match          | 0      |
| Handler_mrr_init           | 0      |
| Handler_mrr_key_refills    | 0      |
| Handler_mrr_rowid_refills  | 0      |
| Handler_prepare            | 0      |
| Handler_read_first         | 0      |
| Handler_read_key           | 119842 |
| Handler_read_last          | 1      |
| Handler_read_next          | 0      |
| Handler_read_prev          | 59942  |
| Handler_read_rnd           | 0      |
| Handler_read_rnd_deleted   | 0      |
| Handler_read_rnd_next      | 0      |
| Handler_rollback           | 2      |
| Handler_savepoint          | 0      |
| Handler_savepoint_rollback | 0      |
| Handler_tmp_update         | 0      |
| Handler_tmp_write          | 0      |
| Handler_update             | 0      |
| Handler_write              | 0      |
+----------------------------+--------+
25 rows in set (0.01 sec)

I had to kill the query because it would not finish- this is not something we can run on production- if we give some kind of option to the user of which query to select, and without guarantees that the findings will be uniform. The problem is the WHERE ct_tag = 'Foo' ORDER BY rev_timestamp DESC,rev_id DESC LIMIT 11 combined with the STRAIGHT_JOIN- this makes a full index scan of the revision table backwards- removing the STRAIGHT_JOIN should be beneficial (although I have not analyzed all cases).

Without straight join and the same parameter (needs more testing):

enwiki> EXPLAIN SELECT rev_id,rev_page,rev_text_id,rev_timestamp,rev_user_text,rev_user,rev_minor_edit,rev_deleted,rev_len,rev_parent_id,rev_sha1,rev_comment AS rev_comment_text,NULL AS rev_comment_data,NULL AS rev_comment_cid,rev_content_format,rev_content_model,page_namespace,page_title,page_id,page_latest,page_is_redirect,page_len FROM revision INNER JOIN page ON ((page_id = rev_page)) INNER JOIN change_tag ON ((rev_id=ct_rev_id)) WHERE ct_tag = 'Foo' ORDER BY rev_timestamp DESC,rev_id DESC LIMIT 11\G     *************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: change_tag
         type: ref
possible_keys: ct_rev_id,change_tag_rev_tag,ct_tag,change_tag_tag_id
          key: ct_tag
      key_len: 257
          ref: const
         rows: 1
        Extra: Using where; Using index; Using temporary; Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: revision
         type: eq_ref
possible_keys: PRIMARY,page_timestamp,rev_page_id
          key: PRIMARY
      key_len: 4
          ref: enwiki.change_tag.ct_rev_id
         rows: 1
        Extra: Using where
*************************** 3. row ***************************
           id: 1
  select_type: SIMPLE
        table: page
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: enwiki.revision.rev_page
         rows: 1
        Extra: 
3 rows in set (0.00 sec)

enwiki> FLUSH STATUS; SELECT rev_id,rev_page,rev_text_id,rev_timestamp,rev_user_text,rev_user,rev_minor_edit,rev_deleted,rev_len,rev_parent_id,rev_sha1,rev_comment AS rev_comment_text,NULL AS rev_comment_data,NULL AS rev_comment_cid,rev_content_format,rev_content_model,page_namespace,page_title,page_id,page_latest,page_is_redirect,page_len FROM revision INNER JOIN page ON ((page_id = rev_page)) INNER JOIN change_tag ON ((rev_id=ct_rev_id)) WHERE ct_tag = 'Foo' ORDER BY rev_timestamp DESC,rev_id DESC LIMIT 11; SHOW STATUS like 'Hand%';                     
Query OK, 0 rows affected (0.00 sec)

Empty set (0.00 sec)

+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Handler_commit             | 2     |
| Handler_delete             | 0     |
| Handler_discover           | 0     |
| Handler_external_lock      | 0     |
| Handler_icp_attempts       | 0     |
| Handler_icp_match          | 0     |
| Handler_mrr_init           | 0     |
| Handler_mrr_key_refills    | 0     |
| Handler_mrr_rowid_refills  | 0     |
| Handler_prepare            | 0     |
| Handler_read_first         | 0     |
| Handler_read_key           | 1     |
| Handler_read_last          | 0     |
| Handler_read_next          | 0     |
| Handler_read_prev          | 0     |
| Handler_read_rnd           | 0     |
| Handler_read_rnd_deleted   | 0     |
| Handler_read_rnd_next      | 1     |
| Handler_rollback           | 0     |
| Handler_savepoint          | 0     |
| Handler_savepoint_rollback | 0     |
| Handler_tmp_update         | 0     |
| Handler_tmp_write          | 0     |
| Handler_update             | 0     |
| Handler_write              | 0     |
+----------------------------+-------+
25 rows in set (0.02 sec)
Pppery removed Pppery as the assignee of this task.Mar 5 2018, 5:59 PM
Pppery subscribed.
Pppery unsubscribed.

Change 394757 abandoned by Pppery:
Add arvtag parameter to list=allrevisions API queries.

https://gerrit.wikimedia.org/r/394757

Aklapper changed the subtype of this task from "Task" to "Feature Request".