Page MenuHomePhabricator

Media endpoint does not refresh structured captions
Closed, ResolvedPublic

Description

The page/media endpoint now returns structured image captions for each of the media items, which is excellent. However, it does not seem to update the content when the structured captions are edited. Is it possible to refresh the Media endpoint when the structured caption of any of the constituent images is edited?

Event Timeline

Since /page/media isn't yet stored in RESTBase, I assume this is just the effect of Varnish caching at this point. But now that I think about it, invalidating /page/media content correctly and efficiently could actually be rather tricky. /page/media includes metadata (structured and unstructured) for a variety of the media files used on the page to which it refers. As it stands, we would have to invalidate the /page/media content for every page in which an image is used whenever the image's structured data is edited. That info is only provided by a table in GlobalUsage. I think that table is only updated periodically, not in near-real time, so we could be updating based on slightly outdated information.

Also, regenerating potentially dozens of stored /page/media responses upon image metadata edit could end up being a lot of load if the edit volume on SDC ends up being similar to on Wikidata (which I think is a safe assumption); also, a lot of that work could potentially be wasted, since /page/media only includes a subset of all images used on each page (e.g., it attempts to exclude images used as wiki UI elements).

Perhaps we'd be better off rethinking the architecture of /page/media a bit. If we had endpoints along the lines proposed in T224920—perhaps separate endpoints for /media/image/extmetadata/{title} (unstructured image metadata) and /media/image/structured_data/{title} (SDC metadata)—whose response(s) /page/media simply included via merge nodes, and it was the /media/image endpoints rather than /page/media that were RB-stored and invalidated on edit, then we'd have a much more manageable correspondence between edits and content to be invalidated and regenerated.

Yes. The reason is that ChangeProp doesn't know to update /page/media when these kinds of changes happen. I agree that we should consider separating out this information into separate endpoints if this is info that is only needed by the editing aspect.

We do track image usage in change-prop and rerender pages when images get updated, but indeed it's only done for local usage. Having the same mechanism for global usage is possible, even though we would miss some usages, but I support @Mholloway in his concern regarding the scalability of this.

This is yet another example of how much we need proper dependency tracking mechanism.

Since there's zero chance of resolving the ChangeProp dependency tracking / invalidation piece of this before the target launch date for in-app caption suggestions, I think we should proceed as discussed in the Audiences-Platform sync last week, namely, by creating a new /page/media-list endpoint[1] that simply returns the File page titles (and perhaps other on-page info) for the non-UI files in the page, without including other info gatherred from the MW API; and the app should then gather any additional needed info directly from imageinfo and getentities. (There's ongoing discussion on T224920 about the proposed REST API wrapper for imageinfo and possibly wbgetentities.)

Note: /page/media is technically marked experimental, but I don't think we can just change it at this point, since IIUC there's already been at least one production app release that consumes it. So we need to deprecate and leave as-is for now.

@Dbrant @JoeWalsh Does this work for you?

[1] https://gerrit.wikimedia.org/r/#/c/mediawiki/services/mobileapps/+/517135/

Change 517135 had a related patch set uploaded (by Mholloway; owner: Michael Holloway):
[mediawiki/services/mobileapps@master] Add page media list endpoint

https://gerrit.wikimedia.org/r/517135

Change 517135 merged by jenkins-bot:
[mediawiki/services/mobileapps@master] Add page media list endpoint

https://gerrit.wikimedia.org/r/517135

Mentioned in SAL (#wikimedia-operations) [2019-06-20T17:37:31Z] <mholloway-shell@deploy1001> Started deploy [mobileapps/deploy@fd98900]: Deploy media-list endpoint (T225443) and service template upgrade to v0.7.0

Mentioned in SAL (#wikimedia-operations) [2019-06-20T17:43:12Z] <mholloway-shell@deploy1001> Finished deploy [mobileapps/deploy@fd98900]: Deploy media-list endpoint (T225443) and service template upgrade to v0.7.0 (duration: 05m 38s)

@Dbrant can you confirm whether you actually plan to use the new media-list endpoint before I submit a RESTBase PR to publicly launch it?

@Mholloway The app currently uses the media endpoint to get the items for our image gallery. If the media endpoint is being deprecated in favor of the media-list endpoint, then we'll surely need to switch to it and start using it.

Mholloway lowered the priority of this task from High to Medium.Jun 22 2019, 11:48 PM

Not high priority since the Android app can just ignore the info that isn't properly updated and re-query the MW API in the meantime.