Page MenuHomePhabricator

Revision support broken in MCS
Closed, ResolvedPublic

Description

Looking at the code it appears I should be able to access a given revision via:
https://en.wikipedia.org/api/rest_v1/page/mobile-sections-remaining/San_Francisco/740722837

This doesn't seem to be honoured hower. The above gives me a 404
Is this an issue in MCS or Restbase?

Event Timeline

In RESTBase we only store the latest revision for mobile endpoints and that's by design, because apps can only access the latest version.

@Jdlrobson Do you have any use-case in mind where this might be useful?

Yes ... lazy loading references on demand in MobileFrontend. See T146398

My confusion stems from this

/**
 * GET {domain}/v1/page/mobile-sections-remaining/{title}
 * Gets the remaining sections for the mobile app version of a given wiki page.
 */
router.get('/mobile-sections-remaining/:title/:revision?', function (req, res) {

@Jdlrobson Looks like MCS is capable of emitting this data, we just don't store it in RESTBase to save on storage space since older revisions were never used, so we don't expose the 'revision' parameter to the API. As you need it now we can add this capability, although it's not very simple.

Here's what needs to be done on our side:

  1. Migrate mobileapps storage from key_value bucket to key_rev_value bucket this requires a schema change and possibly data reload. We've never done that before, so I'm not quite sure on the strategy yet.
  2. Modify the API endpoints to take the optional revision parameter and pass it through to MCS
  3. Modify ChangeProp to pass revision parameter to RB for content rerendering

#2 and #3 are trivial, #1 requires some thinking though. Please assign this to me and give me some clarity on how soon do you want this API to be done.

Another possible solution, that's way simpler, it to still store only the latest revision (it has high-volume access) and just proxy non-latest revision request to MCS, but that means having significantly higher latency for older content.

If we just kept latest revision and passed a revision id which also happened to be the latest would that have the same latency issue?

I ask as for the most part I expect the revision passed to be up to date. Traffic has been low on the MobileFrontend api equivalent so as long as it's fast for the most recent revision we are in a good place!

Before we add versioning for JSON sections, it might make sense to consider where we are headed in terms of JSON sections vs. streaming HTML. Streaming HTML can potentially lower first paint times by letting the browser display partially downloaded content, avoids the need for exact section request matching (render uuids), and has the potential to keep the API simpler going forward.

It would be also useful to be able to run tests against revisions to avoid issues such as T146930

There are no revisions for MCS because it uses some MobileView API that supports only retrieving data for the latest revision of a page. I am not sure what's the status of that. The task T106143: Support for revision IDs and TIDs in the MobileApps service might have more info on that.

GWicke edited projects, added Services (next); removed Services.

Change 326868 had a related patch set uploaded (by Jdlrobson):
Add revision support to mobile content service

https://gerrit.wikimedia.org/r/326868

This would help editing inside the app - ensuring that after an edit the page is refreshed. Read discussion below:

<jdlrobson> Jon Robson niedzielski: hey you there?
4:10 PM N<niedzielski> jdlrobson: o/
4:11 PM J<jdlrobson> Jon Robson so... im a little baffled about something and im hoping you can enlighten me... when a user does an edit on android, and you refresh the page after a save... how do you make sure you show them the new content?
4:11 PM N<niedzielski> jdlrobson: well...
4:11 PM jdlrobson: we don't have a good solution for that yet AFAIK when using the content service (MW API works fine)
4:12 PM J<jdlrobson> Jon Robson ahh.. so you use MW API?
4:12 PM but how do you deal with the change in presentation?
4:13 PM niedzielski: talking to gwicke and Pchelolo i think https://phabricator.wikimedia.org/T146836 is the solution.
4:13 PM N<niedzielski> jdlrobson: right now, when using the content service we wait a couple seconds and hope for the best :] it's not bulletproof. michael's exploring changing the content locally
4:13 PM J<jdlrobson> Jon Robson niedzielski: that strategy doesnt work on smaller projects e.g. wikivoyage
4:13 PM so if https://phabricator.wikimedia.org/T146836 is fixed - what you can do is take the revision number from the edit response and request the new revision
4:14 PM N<niedzielski> jdlrobson: we've talked about checking for the revision but how long do you wait? do you ultimately still show a "cahnges processing" kind of message if it takes too long?
4:16 PM J<jdlrobson> Jon Robson you wouldnt check.. you'd request... so instead of requesting https://en.wikipedia.org/api/rest_v1/page/mobile-sections/San_Francisco you'd request https://en.wikipedia.org/api/rest_v1/page/mobile-sections/San_Francisco/753467486
4:16 PM N<niedzielski> jdlrobson: i think thought the issue was that the cache isn't updated for 2+ seconds
4:17 PM <jdlrobson> Jon Robson but if you're hitting the above url there would be no cache
4:17 PM so it would generate
4:17 PM that's apparently how VisualEditor work around this issue
4:17 PM N<niedzielski> jdlrobson: oh wow
4:17 PM jdlrobson: well, we're not doing that :] but it sounds like the way to go
4:18 PM J<jdlrobson> Jon Robson niedzielski: okay.. im gonna write a patch for this, cos i want this fixed too :)
4:18 PM is there a bug for the editing problem?
4:19 PM N<niedzielski> jdlrobson: sweet, thanks for the heads up! this is the current open issue (it's come up a few times): T152207
4:19 PM S<stashbot> T152207: Description changes are not reflected without refreshing the page after a description is added or edited - https://phabricator.wikimedia.org/T152207
4:19 PM N<niedzielski> jdlrobson: i think this particular tasks also refers to the issue where you make an edit on one device and then quickly check on another
4:20 PM jdlrobson: but that would be a corner case if we used your approach
4:21 PM J<jdlrobson> Jon Robson descriptions maybe trickier
4:21 PM so the problem with descriptions is they are not revisioned
4:22 PM N<niedzielski> jdlrobson: oh geez
4:23 PM J<jdlrobson> Jon Robson neither are page images
4:23 PM so a request to https://en.wikipedia.org/api/rest_v1/page/mobile-sections/San_Francisco/753467486 would always return the latest description and page image
4:23 PM N<niedzielski> jdlrobson: do you know, if transcluded content changes, does that give a page a new revision?
4:23 PM J<jdlrobson> Jon Robson but that's probably fine in this context
4:24 PM niedzielski: nope.. another edge case..
4:24 PM the main thing this would help is if you edit a page and it refreshes you see the new version
4:24 PM N<niedzielski> jdlrobson: so we couldn't even fake a description revision with a transclude
4:25 PM jdlrobson: well, that would still be a huge win over the current implementation

So, when this is implemented on MCS side please reassign this to me, we'd need to make some changes in RESTBase to support this too. Mosly changing the storage backend to allow storing individual revisions. Another interesting question here is storage needs - the MCS output is almost as big (if not bigger) as the parsoid HTML, so the storage need would be quite significant if we want to store all the revisions.

@Pchelolo I think if we just limit to recent revision that might be enough.

@Pchelolo, the latest retention policy with count=1 and a TTL of 24 hours would probably be ideal. By keeping old revisions for 24 hours, this would provide plenty of time for consistent retrieval of lead + remainder.

@GWicke Old and reminder are updated simultaneously by design, so it's impossible for them to get mismatched. If all the rest of reqs from MCS still take the latest content we can still store just one.

The problem with switching to key_rev_value is that transition would be quite hard, we'd need to create new buckets, reload data, drop old buckets - lot's of work and if we only need the latest content - that work is not needed.

Although, on the other hand we've discussed at some point that key+value bucket is prone to race conditions, so it may be worths doing all the transition work.

@GWicke Old and reminder are updated simultaneously by design, so it's impossible for them to get mismatched. If all the rest of reqs from MCS still take the latest content we can still store just one.

I'm less concerned about mismatches in RESTBase, as those updates happen very close in time (but not atomically). However, clients can retrieve the lead section first, and then wait for a while before requesting the corresponding remainder. If the page has been re-rendered or edited since, they might get a newer remainder that does not match the original lead. This could be problematic in some situations, for example if a new heading was added, or a section was deleted. I think the app currently relies on the section list in the lead response to match the remainder response.

For the API, we could consider using ETags & accept instead of path parameters. This would make sure that renders also match, and does not suggest that random revisions / renders are supported. However, in the edit-followed-by-view use case, the client would need to construct an ETag, or we would need to provide it somehow. In any case, we'd need to accept ETags without a specific UUID, at least in the lead section end point.

For the API, we could consider using ETags & accept instead of path parameters. This would make sure that renders also match, and does not suggest that random revisions / renders are supported. However, in the edit-followed-by-view use case, the client would need to construct an ETag, or we would need to provide it somehow. In any case, we'd need to accept ETags without a specific UUID, at least in the lead section end point.

For the lead section the UUID part of the ETag doesn't really matter - all we need is a revision. The UUID could be useful to exactly match the remainder with the lead, but we can provide the exact UUID together with the lead. So the API could look like this:
/mobile-sections-lead/{title}{/revision} and /mobile-sections-remaining/{title}{/revision}{/tid}. (I'm not proposing that path params are better here, just they're more useful to show the idea)

/mobile-sections-lead/{title}{/revision} will be useful for the app only right after an edit of the same page happens in the app. In all other cases the app would just want the latest revision.

For the remaining request being able to specify the revision is desirable. I don't think that tid is needed. The revision is included in the lead response, so that can be easily retrieved by the app. Having said that, I have not seen any bug reports about this being an issue. It's certainly theoretically possible.

clients can retrieve the lead section first, and then wait for a while before requesting the corresponding remainder

The time window is quite small since the Android app requests the remaining section immediately after it has retrieved and rendered the lead section.

Having said that, I have not seen any bug reports about this being an issue. It's certainly theoretically possible.

Just unlucky timing of change-prop update arrival after some template's been changed and you would get a mismatch between lead and remaining. Adding support for a tid is simple and it helps avoid potential bugs that would be really hard to track down, so why not add it? The TID would be supplied to the client in an ETag

So I'm a little lost. Can we merge as is? If not, what do we need to do? I can feature flag if necessary.

Change 326868 merged by jenkins-bot:
Add revision support to mobile content service

https://gerrit.wikimedia.org/r/326868

https://en.wikipedia.org/api/rest_v1/page/mobile-sections-remaining/San_Francisco/740722837 doesn't work but this is resolved. Are there further steps we need to do @bearND ? Can we lean on @Fjalapeno to help us deploy the new mobile content endpoint and revision support?

@Jdlrobson The revision support in RESTBase is still not deployed and it's tracked by T153133 The patch is already in master, but deploy is pending because the MW train was reverted and we have some stuff that depends on MediaWiki in the new version of RESTBase. Once MW is deployed RB deploy will follow and revision support will appear.

Cool that certainly explains the revision stuff.
Yet we still need to work out a deployment plan for the formatted endpoints.
e.g. /formatted/:title/:revision

It seems that would be a worthy task of @Fjalapeno as it blocks web using the RESTBase endpoints in any way.