Page MenuHomePhabricator

Implement GET History Filter
Closed, ResolvedPublic3 Estimated Story Points

Description

This task depends on successful completion of T231599, T231600 and requires T231558

Acceptance Criteria:

  • Extend /page/{title}/history handler to support a filter parameter
  • Extend /page/{title}/history handler to support the following values for filter:
    • reverted
    • anonymous
    • bot
  • Request:
    • Must support HTTP GET only
    • Request body must be empty
    • Request headers:
      • Must support If-Modified-Since: only if the result set has changed since this date
      • Must support If-None-Match: only if the result set's etag is different from this
  • Response
    • Response must return JSON
    • Response results must be filtered based on the filter param provided

Event Timeline

WDoranWMF set the point value for this task to 3.Sep 4 2019, 12:56 PM

Namespacing, and the details of how it will impact this endpoint, are being discussed in task T232485.

In T231599: Compose New History Queries @Anomie (thanks for the queries!) said this:

Note that in practice the fields and most of the joins should be coming from RevisionStore::getQueryInfo(); the PHP code would just add the WHERE conditions, LIMIT, ORDER BY, and any extra joins like user_groups.

That was also my expectation, and I'm trying to code it now. The "bot" filter wasn't too bad, as it is a relatively small modification to the RevisionStore::getQueryInfo() result.

But for the "anonymous" and "reverted" queries, the primary "FROM" table needs to be "revision_actor_temp", and "revision" needs to be JOINed. I don't see a way to tell RevisionStore::getQueryInfo() to generate an array closer to what we want. I'm finding myself writing brittle code with lots of unsets and array manipulations to restructure the RevisionStore::getQueryInfo() result into what we need.

To illustrate, here's one small snip, but it gets worse:

	unset( $revQuery['tables'][0] );
	unset( $revQuery['tables']['temp_rev_user'] );
	$new = [ 0 => 'revision_actor_temp', 'revision' => 'revision' ];
	$revQuery['tables'] = $new + $revQuery['tables'];

Is there a better way to do this? Is there somewhere else in the codebase that does something similar that I could reference? Or in these cases, should I abandon RevisionStore::getQueryInfo() entirely and just create my own query array from scratch?

To illustrate, here's one small snip, but it gets worse:

	unset( $revQuery['tables'][0] );
	unset( $revQuery['tables']['temp_rev_user'] );
	$new = [ 0 => 'revision_actor_temp', 'revision' => 'revision' ];
	$revQuery['tables'] = $new + $revQuery['tables'];

I'm not sure that would actually work right since you're not changing $revQuery['joins'], and anyway it's a bit over-complicated. Try something like this (from rMW7aadd1f2a155: API: Add STRAIGHT_JOIN to ApiQueryUserContribs to avoid planner oddness):

// XXX: Put an appropriate comment here explaining why this is being done.
$revQuery['joins']['revision'] = $revQuery['joins']['temp_rev_user'];
unset( $revQuery['joins']['temp_rev_user'] );
// It isn't actually necesssary to reorder $revQuery['tables'] as Database does the right thing
// when join conditions are given for all joins, but Gergő is wary of relying on that so pull
// `revision_actor_temp` to the start.
$revQuery['tables'] =
        [ 'temp_rev_user' => $revQuery['tables']['temp_rev_user'] ] + $revQuery['tables'];

Note all this is only needed when you're doing a STRAIGHT_JOIN. If you're not using that flag, the planner won't care what order the tables are specified in as it'll reorder things as it thinks best anyway.

Or in these cases, should I abandon RevisionStore::getQueryInfo() entirely and just create my own query array from scratch?

Probably a bad idea, as you'd have to duplicate much of the logic from that method, and then we'd have to keep it in sync when anything changes in there.

Thank you @Anomie .

You are correct that the the bit of code I posted wouldn't work on its own - I just posted a snippet as I didn't want to turn this into code review. My main concern was to make sure I'm generally approaching this the right way. It seems I am, even if my current syntax needs improvement. Good to know there's some flexibility on the table ordering, that helps.

Change 540002 had a related patch set uploaded (by BPirkle; owner: BPirkle):
[mediawiki/core@master] Add filter options to Core REST API page history handler

https://gerrit.wikimedia.org/r/540002

tl;dr: @eprodromou , what change tags are considered "reverted"? Just mw-undo and mw-rollback, or others too?

Details:

In T231599: Compose New History Queries I see this in the task description:

Where a revert is defined as a revision labelled with either Undo or Rollback

and this in one of Brad's comments:

if the mw-undo and mw-rollback tags aren't among the most-used tags it might wind up choosing stranger plans that may or may not work as well

If I execute this against enwiki:

select distinct ctd_name from change_tag_def;

I get a big list whose entries include:

  • Rapid reverts
  • Undo
  • mw-rollback
  • mw-undo
  • revert
  • reverting anti-vandal bot
  • rollback
  • undo

(Plus a bunch of other tag names that seemed irrelevant to this task)

https://www.mediawiki.org/wiki/Manual:Tags says that people with certain rights can create tags.

I see that mw-rollback and mw-undo are both part of the mediawiki core code, appearing in a few places include ChangeTags.php. Are these the only two I care about? Or do I potentially have to care about different sets of tags on different wikis (and if so, how would I know which)?

For now, I'm proceeding as if I only have to care about mw-rollback and mw-undo, and I've coded things such that it wouldn't be difficult to add other tags. So this question is not blocking this task.

Hey @BPirkle. Doesn't T231599 define this already? Looks like undo and rollback, am I correct?

Hey @BPirkle. Doesn't T231599 define this already? Looks like undo and rollback, am I correct?

Yes, but I don't think that is correct. Or at least, the task description is using "Undo" and "Rollback" to mean the terms that appear in the user interface, not the actual tag names in the database (which is what matters for the query).

The task description for T231599 says "Undo" (first letter uppercase) and "Rollback" (first letter uppercase). We do have an "Undo" tag in the enwiki database (ctd_id 543), but we do not have a "Rollback" one. So if I look for (only) "Undo" and "Rollback", it will never, ever find anything for "Rollback" because it does not exist.

We also have both "undo" (all lowercase, ctd_id 539) and "rollback" (all lowercase, ctd_id 533) tags on enwiki. Notice that "Undo" and "undo" have different change tag ids. Looking for one won't find the other. So I need to know if I should look for one, both, or neither.

Additionally, Brad mentioned "mw-undo" (ctd_id 1) and "mw-rollback" (ctd_id 8), which appear in core code, and which are probably what we actually want. Looking at the number of occurrences of each (column ctd_count), I see:

ctd_namectd_idnumber of tagged revisions
Undo5431
undo5391
rollback5332
mw-undo12,884,806
mw-rollback81,862,577

It seems unlikely that we want to implement a whole filtering option just for the 1 revision tagged "Undo". I'm very confident that we really want "mw-undo" and "mw-rollback". However, I'm not very familiar with our tagging system, and how it is used from wiki to wiki. So my questions really are:

  1. do we care about any tags in addition to mw-undo and mw-rollback?
  2. if the answer to #1 is yes, will the set of additional tags we care about change from wiki to wiki?
  3. if the answer to #2 is yes, how do we know which tags we care about?

Note that "Undo", "undo", and "rollback" there are user-created tags, that someone created at some point. Unfortunately none of the tag creation log entries (Undo, undo, rollback) include a reason.

Also of note is that there are only two revisions tagged with either tag, one on User:SD0001/sandbox and one deleted revision on another user's sandbox.

My guess would be that people created those tags by mistaking the creation form for a search form, then people developing user scripts accidentally used them in a few tests.

@BPirkle lgtm. Use mw-undo and mw-rollback. Ignore any other tags for now; we'll worry about having a more precise algorithm later.

Change 538133 had a related patch set uploaded (by BPirkle; owner: BPirkle):
[mediawiki/core@master] Core REST API handler for GET page history

https://gerrit.wikimedia.org/r/538133

Change 540002 abandoned by BPirkle:
Add filter options to Core REST API page history handler

Reason:
Abandoning in favor of change 538133

https://gerrit.wikimedia.org/r/540002

Change 538133 merged by jenkins-bot:
[mediawiki/core@master] Core REST API handler for GET page history

https://gerrit.wikimedia.org/r/538133

Two of the filters are done, but reverted shows reverting revisions, not reverted revisions.

I'm going to close this, and we'll worry about "reverted" in other tasks.