Page MenuHomePhabricator

Support flagged revisions in RESTBase
Open, HighPublic

Description

Several projects (including dewiki) use the FlaggedRevs extension to show a reviewed version of an article to anonymous users. REST API users like the mobile apps expect similar functionality, so we should support this.

API behavior

Use cases:

  • Mobile apps, reading focused web apps: Load stable revision of an article.
  • Editing: Load the *latest* revision of an article.
  • Recentchanges followers: Retrieve specific revisions, independent of whether they are flagged or not.

Strawman proposal:

  • /page/html/{title}: Return flagged (stable) revision.
  • /page/html/{title}/{revision}: Return specific revision, independent of whether it is stable or not.

For editing, we also need a way to explicitly retrieve the latest revision, stable or not. Options:

  1. Let the editor ask for the latest revision explicitly, using /page/html/{title}/{revision}. This is current VE behavior on mis-match between metadata & REST API response.
    • Advantage: Simplicity. No code changes needed. Will need similar editor smarts when reusing view HTML as well.
    • Disadvantage: Doubled request latency on mis-match. This is likely to affect a relatively small percentage of edits.
  2. Extra parameter for "latest" entry point: /page/html/{title}?latest=true.
    • Disadvantages: Somewhat ugly; problematic for caching. Would make CDN caching ineffective for edit requests.
    • Advantages: Reduces probability of version mis-match on edit.
  3. Special revision value: /page/html/latest
    • Disadvantages: Similar to the ?latest=true parameter above. Complicates parameter validation and documentation.
    • Advantages: Reduces probability of version mis-match on edit.

I am currently leaning towards 1), mostly because it does not complicate the API, and because it should still result in decent performance for editing. It also seems to be the option we'd want once we use Parsoid HTML for views, and don't need to load any HTML for editing in the common case of flagged revision being the same as the latest revision.

Implementation options in RESTBase

Efficient access to flagged revisions will require some changes in RESTBase's database structure. Here are a few options.

(Temporary): Retrieve latest flagged revision from the MW API

Idea: For requests to /page/html/{title} (latest revision), request the latest flagged revision number from the MediaWiki API in parallel with retrieving the latest stored revision from Cassandra. Then compare the two. If the latest stored revision does not match, repeat the Cassandra storage request with the latest flagged revision.

Performance impact, on projects with flagged revisions enabled:

  • Requests for specific revisions: No impact on requests for specific revisions.
  • Requests for latest revision: Slow down to the max of (MW API request, storage request) in most cases. In practice, this would likely be a median slow-down from about 6ms (storage) to about 50ms (MW API). In the relatively rare case of recently edited pages with outstanding unpatrolled edits, we would repeat the storage request, adding another ~6ms median.
Open questions
  • CDN purging on flagged revision review. Need to purge "latest" cached content in that case. @Pchelolo mentioned that we do not have a flaggedrev review event in eventbus yet, but we could consider to use the flaggedrevs_CacheUpdate job mentioned in https://phabricator.wikimedia.org/T175210 as a pseudo-event.

Extra "flagged" table

Idea: Add an extra table keyed on title that contains those revisions that are flagged, but not the latest for the title. Read from both the flagged table & latest table, and return the flagged revision if found.

Updates
  • On edit:
    • Make sure the flagged revision is stored in the flagged table.
    • Store the new revision.
  • On flag status update:
    • If flagged revision is the latest, delete the flagged entry from the "flagged" table.
    • Otherwise, store the new flagged revision to the "flagged" table.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
mobrovac added a subscriber: mobrovac.

I favour the first option as well. Not only is it the most straightforward, it is also the expected behaviour form clients interacting with our infrastructure.

On the implementation side, an extra table sounds fine, but I wonder if we could make it work with just an extra static column indicating the flagged revision ID to return? (Just thinking out loud here).

Also note that the FlaggedRevs extension puts forward this extra requirement:

These revisions will remain the same even if included templates are changed or images are overwritten

My understanding here is that when the flagged revision happens to be the latest, we should skip any async updates to it.

@mobrovac, I was just about to add the static column option, but then realized that it is unclear (at least to me) how the data for the flagged revision would actually be stored. If you have an idea for how that could be solved, then please go ahead & add that option to the description.

GWicke triaged this task as Normal priority.Aug 8 2017, 8:56 PM

/page/html/{title}/{revision}: Return specific revision, independent of whether it is stable or not.

This should be auth-based though since only a subset of users with the right role can view this page.

This should be auth-based though since only a subset of users with the right role can view this page.

My understanding is that all revisions are accessible to users with "read" permissions. The flagged revisions extension basically only modifies which version is shown by default. Quoting from the extension description:

The Flagged Revisions extension allows for Editor and Reviewer users to rate revisions of articles and set those revisions as the default revision to show upon normal page view.

So, unless there is something I missed, there shouldn't be a need for special read protections for individual revisions in a flaggedrev-enabled project.

GWicke raised the priority of this task from Normal to High.Aug 29 2017, 6:38 PM

Bumped priority, as support for flagged revisions is important for serious reading use cases. There is also an opportunity to piggy-back on current storage schema migration efforts.

bearND added a subscriber: bearND.Sep 1 2017, 2:09 AM

+1 to option 1) as well. It seem it would require no changes in MCS/PCS as well.

GWicke updated the task description. (Show Details)Sep 8 2017, 6:12 PM
GWicke added a subscriber: Pchelolo.
GWicke updated the task description. (Show Details)Sep 8 2017, 9:23 PM

An intermediary solution which consults the MW API is available as PR 863, but as of the time of writing needs some improvements, so comments/suggestions are welcomed on the PR.

An intermediary solution which consults the MW API is available as PR 863, but as of the time of writing needs some improvements, so comments/suggestions are welcomed on the PR.

The PR has now reached its final version. It is applied to /html/{title} and /mobile-sections*/{title} routes when clients ask for the latest revision of a page. If no objections are raised, it will be merged and deployed on Monday, 2017-10-02.

Change 380540 had a related patch set uploaded (by Mobrovac; owner: Mobrovac):
[mediawiki/services/change-propagation/deploy@master] Config: Purge the latest HTML route for FlaggedRevs cache updates

https://gerrit.wikimedia.org/r/380540

Change 380805 had a related patch set uploaded (by Mobrovac; owner: Mobrovac):
[mediawiki/extensions/FlaggedRevs@master] Add the FlaggedRevsStableRevisionUpdate hook

https://gerrit.wikimedia.org/r/380805

Change 380817 had a related patch set uploaded (by Mobrovac; owner: Mobrovac):
[mediawiki/event-schemas@master] Add the mediawiki/page/stable-revision-change event schema

https://gerrit.wikimedia.org/r/380817

Change 381010 had a related patch set uploaded (by Mobrovac; owner: Mobrovac):
[mediawiki/extensions/EventBus@master] Emit the page-stable-revision-change event

https://gerrit.wikimedia.org/r/381010

Tgr added a subscriber: Tgr.Sep 28 2017, 4:36 PM

On desktop, logged-in users will always be shown the latest revision of the page. (Well, it's subject to the page configuration I mentioned in the PR... but almost always.) If the apps (or, eventually, the new mobile web interface) want to replicate that behavior, then there needs to be a way to fetch the latest revision without knowing the id in advance - a new endpoint, or maybe some kind of content negotiation.

bearND added a comment.EditedSep 28 2017, 5:47 PM

I would hold off merging and deploying this a bit longer until the following question is solved and agreed upon with clients and design (and possibly an Android app update can be released which handles that):

How do we display in the clients that a user is not looking at the latest revision?

I think to support this use case RESTBase should provide additional HTTP headers to indicate the latest revision if a stable revision (that is not the latest revision) ends up being served.

(FYI: to get some example pages go to en:Special:PendingChanges or dewiki or deferred-changes.wmflabs.org .)

Not logged in desktop users get a new tab between Read and Edit, called "Pending changes". Clicking on the link reveals the box with the special information about the pending changes.

Example text: "The latest accepted version was accepted on 28 September 2017. There is 1 pending revision awaiting review."

Logged in desktop users can specify in their preferences (Pending changes section) which revision should be displayed:

  • Use the default settings for each page: "The latest accepted version was accepted on 28 September 2017. There is 1 pending revision awaiting review."
  • Always show the stable version (if there is one): "This is the latest accepted revision, reviewed on 28 September 2017. There is 1 pending revision awaiting review."
  • Always show the latest version: "The latest accepted version was accepted on 28 September 2017. There is 1 pending revision awaiting review."
Tgr added a comment.Sep 28 2017, 7:12 PM

Not logged in desktop users get a new tab between Read and Edit, called "Pending changes". Clicking on the link reveals the box with the special information about the pending changes.

Clicking on the link will take you to the latest revision of the page. The flagrev message is prominent on the latest revision and less prominent in the stable revision (it's behind the little magnifier icon dropdown) but exists on both, with slightly different wording. (Might be prominent in both cases on some wikis; it's configurable.)

FWIW the information used to construct the message is revision number of stable revision + timestamp of stable revision + number of revisions after the stable one.

Tgr added a comment.Sep 28 2017, 9:04 PM

Also worth noting that the stable version and the base revision of the stable version are not the same: template and file transclusions are handled differently. Compare for example (in a logged-out browser session) the following links:

They point to the same revision but the article text is different. RESTBase does not handle that now but eventually it will have to so it's worth considering how clients will be able to differentiate between those.

(The current URL format does not seem problematic, just noting this for the sake of completeness.)

stjn added a subscriber: stjn.Apr 30 2018, 5:17 PM

How do we display in the clients that a user is not looking at the latest revision?

This is really insignificant question in regards to damage that not having FlaggedRevs supported by this API has for recently enabled globally Page Previews. FlaggedRevs is enabled on our projects, for one of the reasons, to show the stabilised version on high-traffic pages and currently the vandalism goes through Page Previews which makes the FlaggedRevs purpose somewhat void.

Cirdan added a subscriber: Cirdan.Jul 12 2018, 7:00 PM