Page MenuHomePhabricator

Filter page histories by user, or contributions by title
Open, LowestPublicFeature

Description

Special:Contributions should allow selecting of concrete page so one can see all of some user's edits on such page. Good for pages with non-trivial history.


Version: 1.21.x
Severity: enhancement
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=2667

Details

Related Objects

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 9:50 PM
bzimport set Reference to bz10788.
bzimport added a subscriber: Unknown Object (MLST).

ayg wrote:

I'm committing a schema change for this now. It will require a new index on the revision table, so it probably won't be live for quite some time even once the code to do it is written.

*** Bug 16745 has been marked as a duplicate of this bug. ***

(In reply to comment #1)

I'm committing a schema change for this now. It will require a new index on
the revision table, so it probably won't be live for quite some time even once
the code to do it is written.

Removing schema-change keyword on the assumption that it's already happened: the comment is from 2007 and the API manages to do this just fine (prop=revisions&rvuser=Catrope&titles=Main_Page).

http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=Main_Page&rvlimit=500&rvuser=MZMcBride

You are looking at the HTML representation of the XML format.
HTML is good for debugging, but probably is not suitable for your application.
See complete documentation, or API help for more information.
<?xml version="1.0"?>
<api>

<query>
  <normalized>
    <n from="Main_Page" to="Main Page" />
  </normalized>
  <pages>
    <page pageid="15580374" ns="0" title="Main Page">
      <revisions>
        <rev revid="250138233" user="MZMcBride"

timestamp="2008-11-07T00:25:19Z" comment="cleaned up code, per talk" />

<rev revid="210742813" user="MZMcBride"

timestamp="2008-05-07T05:22:48Z" comment="rv" />

<rev revid="210742552" user="MZMcBride"

timestamp="2008-05-07T05:20:55Z" comment="restored margin-top with lower value"
/>

<rev revid="210742376" user="MZMcBride"

timestamp="2008-05-07T05:19:46Z" comment="rm margin-top, see talk" />

<rev revid="199549748" user="MZMcBride"

timestamp="2008-03-20T08:11:25Z" comment="rv" />

<rev revid="197643346" user="MZMcBride"

timestamp="2008-03-12T03:52:20Z" comment="rm unneeded div" />

<rev revid="191953558" user="MZMcBride"

timestamp="2008-02-16T23:45:51Z" comment="bypassed redirect" />

      </revisions>
    </page>
  </pages>
</query>

</api>


Above is an example use.

I'd like a different page that lets me display user contribution (individual
revisions) like page history. Basically I want
*the diff link to be clickable
*timestamp displayed in standard dating format (depending on users pref like
how history date & time is displayed)
*possibly adding a text box to history pages so that I can "search" a users
contribution.
**For example if I went to page history of "Main Page" and search for
"MZMcBride" I'd get the above feed

  • Bug 24345 has been marked as a duplicate of this bug. ***

So all that's missing for this now is hacking up SpecialContributions.php? Is that right? If so, this can probably be marked "easy", right? Doing so.

(In reply to comment #1)

I'm committing a schema change for this now. It will require a new index on
the revision table, so it probably won't be live for quite some time even once
the code to do it is written.

This was committed in r25267.

(In reply to comment #3)

(In reply to comment #1)

I'm committing a schema change for this now. It will require a new index on
the revision table, so it probably won't be live for quite some time even once
the code to do it is written.

Removing schema-change keyword on the assumption that it's already happened:
the comment is from 2007 and the API manages to do this just fine
(prop=revisions&rvuser=Catrope&titles=Main_Page).

Re-adding "schema-change" keyword and adding "shell" keyword (someone has to apply the change, after all). Also targeting this for 1.20 and bumping importance a bit, per my reasoning at bug 2667 comment 4.

The assumption here is faulty, I think. Or at least this is easily answerable. I asked in #wikimedia-tech if the index "INDEX page_user_timestamp (rev_page,rev_user,rev_timestamp)" was on the revision table. DaB. said that the Toolserver didn't have this index, so it was likely that the masters don't either. Based on this, I'd like hard evidence to be able to say that a schema change isn't necessary here still.

(In reply to comment #7)

(In reply to comment #1)

I'm committing a schema change for this now. It will require a new index on
the revision table, so it probably won't be live for quite some time even once
the code to do it is written.

This was committed in r25267.

And reverted in r25290, apparently.

I guess I'll remove a few keywords now. From a coding point-of-view, this is (still) a pretty easy bug to resolve, it appears. So we'll keep the "easy" keyword.

r113109, r113110, r113111 brings this back into life

It'll get added to WMF later in the 1.20 cycle, maybe before deployment

Are there other

(In reply to comment #10)

r113109, r113110, r113111 brings this back into life

It'll get added to WMF later in the 1.20 cycle, maybe before deployment

Are there other

Comment 0 was about getting Special:Contributions to do this. I don't see any commits to that part of the code. (You just need a page title input on Special:Contributions/user. It should be trivial.) Should this bug be re-opened or should a separate ticket be filed?

Reopening per comment 11.

Database index supporting this filter has been added. But interface (HistoryAction, SpecialContributions and possibly some API modules as well) still needs to be done.

Indeed, per my IRC comment straight afterwards wondering why I actually closed it, and my cut off question asking what else needs doing

TODO:
(In reply to comment #12)

Reopening per comment 11.

Database index supporting this filter has been added. But interface
(HistoryAction, SpecialContributions and possibly some API modules as well)
still needs to be done.

afeldman wrote:

I don't think we currently need the new revision index in order to support user edits per page queries.

From testing queries like:

select * from revision where rev_page = 1952670 and rev_user_text = "HBC AIV helperbot7" order by rev_timestamp desc limit 50;

where that page id = http://en.wikipedia.org/wiki/Wikipedia:Administrator_intervention_against_vandalism with > 800k edits and that user has > 122k edits and it ran in ~30ms on a prod enwiki db. I got similar times against frequent editors of the India article, and when querying both with non-existent rev_user_text values.

  • Bug 42217 has been marked as a duplicate of this bug. ***

Okay, so it looks like tables.sql in master has the relevant index:

CREATE INDEX /*i*/page_user_timestamp ON /*_*/revision (rev_page,rev_user,rev_timestamp);

The question becomes whether this index has been applied to Wikimedia wikis or whether it needs to be (cf. comment 15), then?

The question becomes whether this index has been applied to Wikimedia wikis or
whether it needs to be (cf. comment 15), then?

mysql:wikiadmin@db63 [enwiki]> show indexes from revision\G

  • 1. row ******* Table: revision Non_unique: 0 Key_name: PRIMARY

Seq_in_index: 1
Column_name: rev_page

Collation: A

Cardinality: 100127178

  Sub_part: NULL
    Packed: NULL
      Null:
Index_type: BTREE
   Comment:
  • 2. row ******* Table: revision Non_unique: 0 Key_name: PRIMARY

Seq_in_index: 2
Column_name: rev_id

Collation: A

Cardinality: 500635893

  Sub_part: NULL
    Packed: NULL
      Null:
Index_type: BTREE
   Comment:
  • 3. row ******* Table: revision Non_unique: 0 Key_name: rev_id

Seq_in_index: 1
Column_name: rev_id

Collation: A

Cardinality: 500635893

  Sub_part: NULL
    Packed: NULL
      Null:
Index_type: BTREE
   Comment:
  • 4. row ******* Table: revision Non_unique: 1 Key_name: rev_timestamp

Seq_in_index: 1
Column_name: rev_timestamp

Collation: A

Cardinality: 500635893

  Sub_part: NULL
    Packed: NULL
      Null:
Index_type: BTREE
   Comment:
  • 5. row ******* Table: revision Non_unique: 1 Key_name: page_timestamp

Seq_in_index: 1
Column_name: rev_page

Collation: A

Cardinality: 62579486

  Sub_part: NULL
    Packed: NULL
      Null:
Index_type: BTREE
   Comment:
  • 6. row ******* Table: revision Non_unique: 1 Key_name: page_timestamp

Seq_in_index: 2
Column_name: rev_timestamp

Collation: A

Cardinality: 500635893

  Sub_part: NULL
    Packed: NULL
      Null:
Index_type: BTREE
   Comment:
  • 7. row ******* Table: revision Non_unique: 1 Key_name: user_timestamp

Seq_in_index: 1
Column_name: rev_user

Collation: A

Cardinality: 27813105

  Sub_part: NULL
    Packed: NULL
      Null:
Index_type: BTREE
   Comment:
  • 8. row ******* Table: revision Non_unique: 1 Key_name: user_timestamp

Seq_in_index: 2
Column_name: rev_timestamp

Collation: A

Cardinality: 500635893

  Sub_part: NULL
    Packed: NULL
      Null:
Index_type: BTREE
   Comment:
  • 9. row ******* Table: revision Non_unique: 1 Key_name: usertext_timestamp

Seq_in_index: 1
Column_name: rev_user_text

Collation: A

Cardinality: 1053970

  Sub_part: NULL
    Packed: NULL
      Null:
Index_type: BTREE
   Comment:
  • 10. row ******* Table: revision Non_unique: 1 Key_name: usertext_timestamp

Seq_in_index: 2
Column_name: rev_timestamp

Collation: A

Cardinality: 500635893

  Sub_part: NULL
    Packed: NULL
      Null:
Index_type: BTREE
   Comment:
  • 11. row ******* Table: revision Non_unique: 1 Key_name: usertext_timestamp

Seq_in_index: 3
Column_name: rev_user

Collation: A

Cardinality: 500635893

  Sub_part: NULL
    Packed: NULL
      Null:
Index_type: BTREE
   Comment:
  • 12. row ******* Table: revision Non_unique: 1 Key_name: usertext_timestamp

Seq_in_index: 4
Column_name: rev_deleted

Collation: A

Cardinality: 500635893

  Sub_part: NULL
    Packed: NULL
      Null:
Index_type: BTREE
   Comment:
  • 13. row ******* Table: revision Non_unique: 1 Key_name: usertext_timestamp

Seq_in_index: 5
Column_name: rev_minor_edit

Collation: A

Cardinality: 500635893

  Sub_part: NULL
    Packed: NULL
      Null:
Index_type: BTREE
   Comment:
  • 14. row ******* Table: revision Non_unique: 1 Key_name: usertext_timestamp

Seq_in_index: 6
Column_name: rev_text_id

Collation: A

Cardinality: 500635893

  Sub_part: NULL
    Packed: NULL
      Null:
Index_type: BTREE
   Comment:
  • 15. row ******* Table: revision Non_unique: 1 Key_name: usertext_timestamp

Seq_in_index: 7
Column_name: rev_comment

Collation: A

Cardinality: 500635893

  Sub_part: NULL
    Packed: NULL
      Null: YES
Index_type: BTREE
   Comment:

15 rows in set (0.27 sec)

tl;dr: the page_user_timestamp index is _not_ on the Wikimedia databases.

Asher: are you comfortable with user interface exposure of this functionality without this index?

(In reply to comment #20)

tl;dr: the page_user_timestamp index is _not_ on the Wikimedia databases.

Asher: are you comfortable with user interface exposure of this functionality
without this index?

Asher: ping?

afeldman wrote:

Yes, I am so long as the interface doesn't provide pagination based on limit+offset queries.

mysql:root@db1051 [enwiki]> select count(*) from revision where rev_page = 1952670;
+----------+

count(*)

+----------+

941113

+----------+
1 row in set (0.68 sec)

mysql:root@db1051 [enwiki]> select count(1) as count, rev_user from revision where rev_page = 1952670 group by rev_user order by count desc limit 1;
+--------+----------+

countrev_user

+--------+----------+

1391666327251

+--------+----------+

mysql:root@db1051 [enwiki]> explain select * from revision where rev_page = 1952670 and rev_user = 6327251 order by rev_timestamp desc limit 25;
+------+-------------+----------+------+---------------------------------------+----------------+---------+-------+---------+-------------+

idselect_typetabletypepossible_keyskeykey_lenrefrowsExtra

+------+-------------+----------+------+---------------------------------------+----------------+---------+-------+---------+-------------+

1SIMPLErevisionrefPRIMARY,page_timestamp,user_timestamppage_timestamp4const2707632Using where

+------+-------------+----------+------+---------------------------------------+----------------+---------+-------+---------+-------------+
1 row in set (0.03 sec)

2707632 looks bad, but:

mysql:root@db1051 [enwiki]> flush status;
mysql:root@db1051 [enwiki]> select * from revision where rev_page = 1952670 and rev_user = 6327251 order by rev_timestamp desc limit 25;
....

mysql:root@db1051 [enwiki]> show status like 'Handler_read%';
+--------------------------+-------+

Variable_nameValue

+--------------------------+-------+

Handler_read_first0
Handler_read_key1
Handler_read_last0
Handler_read_next0
Handler_read_prev151
Handler_read_rnd0
Handler_read_rnd_deleted0
Handler_read_rnd_next0

+--------------------------+-------+
8 rows in set (0.02 sec)

That example is for the most edited page and the user with the most edits. The worst case for the current schema would be a user with only one edit of the most edited page.

mysql:root@db1051 [enwiki]> explain select * from revision where rev_page = 1952670 and rev_user = 4305640 order by rev_timestamp desc limit 25;
+------+-------------+----------+------+---------------------------------------+----------------+---------+-------+------+-------------+

idselect_typetabletypepossible_keyskeykey_lenrefrowsExtra

+------+-------------+----------+------+---------------------------------------+----------------+---------+-------+------+-------------+

1SIMPLErevisionrefPRIMARY,page_timestamp,user_timestampuser_timestamp4const479Using where

+------+-------------+----------+------+---------------------------------------+----------------+---------+-------+------+-------------+
1 row in set (0.03 sec)

mysql:root@db1051 [enwiki]> flush status;
Query OK, 0 rows affected (0.03 sec)

mysql:root@db1051 [enwiki]> select * from revision where rev_page = 1952670 and rev_user = 4305640 order by rev_timestamp desc limit 25;
+-----------+----------+-------------+---------------------+----------+---------------+----------------+----------------+-------------+---------+---------------+---------------------------------+

rev_idrev_pagerev_text_idrev_commentrev_userrev_user_textrev_timestamprev_minor_editrev_deletedrev_lenrev_parent_idrev_sha1

+-----------+----------+-------------+---------------------+----------+---------------+----------------+----------------+-------------+---------+---------------+---------------------------------+

1348648811952670134137447/* User-reported */4305640AM01NU0620070531180220001905134864494l2ac4niowr2u88gl84ii3vz9cbrkc6t

+-----------+----------+-------------+---------------------+----------+---------------+----------------+----------------+-------------+---------+---------------+---------------------------------+
1 row in set (0.13 sec)

mysql:root@db1051 [enwiki]> show status like 'Handler_read%';
+--------------------------+-------+

Variable_nameValue

+--------------------------+-------+

Handler_read_first0
Handler_read_key1
Handler_read_last0
Handler_read_next0
Handler_read_prev480
Handler_read_rnd0
Handler_read_rnd_deleted0
Handler_read_rnd_next0

+--------------------------+-------+
8 rows in set (0.03 sec)

130ms is a little sucky but for a worst case, this should still be ok.

The nice thing is that hopefully we'll be able to use extended_keys in MariaDB 5.5.31. Everything is currently migrating to 5.5.30 which has an optimizer bug that I reported regarding extended_keys so I have it disabled in production for now. It allows the primary key to be fully utilized as the right side member of every secondary key. The revision primary key is:

PRIMARY KEY (`rev_page`,`rev_id`),

and we have in production:

KEY `user_timestamp` (`rev_user`,`rev_timestamp`),

So with extended_keys, rev_page will be useable without having to increase the index side by duplicating as with the proposed page_user_timestamp.

Note that the revision PRIMARY KEY is just rev_id for new tables for MediaWiki. No one ever got around to changing existing ones already.

Unassigning from Reedy since he's not working on it.

Removing target milestone that was in the past.

If you want this in a specific release, have a good reason AND you are willing to find resources to fix this bug, feel free to change it to something appropriate.

Change 329340 had a related patch set uploaded (by Umherirrender):
Add username filter to action=history

https://gerrit.wikimedia.org/r/329340

This is being handled as part of T132416, however, the addition of this index has and is creating constant outages. I will create a separate task for that soon.

This task is about adding functionality to MediaWiki. The schema change being done under T132416 is part of that, but not all of it.

Ok. I supposed it was the only thing pending. Maybe it would be clearer if specific tickets were opened just for the pending changes? Or in this case, a dependency relationship?

I see you just did that, thanks!

Change 329340 abandoned by Umherirrender:
Add username filter to action=history

Reason:
No longer support for this patch set - feel free to upload under own name and work on, if needed

https://gerrit.wikimedia.org/r/329340

Aklapper changed the subtype of this task from "Task" to "Feature Request".Feb 4 2022, 11:02 AM