Page MenuHomePhabricator

Add disambiguation page handling in Page Summary API
Closed, ResolvedPublic

Description

Per T113094: [EPIC] The Page Summary API needs to provide useful content for the majority of articles, you'll find the spec for the Page Summary API here: https://www.mediawiki.org/wiki/User:Phuedx_(WMF)/Reading/Web/Page_Preview_API

AC

Plan (YMMV)

  1. If the page is a disambiguation page, then request the first N links from the page using API:Links.
  2. Make N configurable.

Related Objects

StatusSubtypeAssignedTask
ResolvedNone
ResolvedNone
Resolved Jhernandez
Resolved Mholloway
DuplicateNone
DeclinedNone
ResolvedDereckson
ResolvedJdlrobson
Resolvedovasileva
DuplicateNone
DeclinedNone
Resolved Nirzar
Resolvedovasileva
ResolvedJdlrobson
DuplicateNone
DuplicateNone
Resolvedovasileva
DeclinedJdlrobson
ResolvedJdlrobson
Resolvedovasileva
ResolvedJdlrobson
ResolvedJdlrobson
ResolvedJdlrobson
ResolvedJdlrobson
Resolvedphuedx

Event Timeline

So.. I already looked into this and this is actually messy.
mobileview api is used for metadata.

So either

  1. We add an additional api query under the hood to all mobile content service requests
  2. We expose the disambiguator pageprop (forward it) inside mobile view

I'd prefer #2 - but that's a MobileFrontend bug, so we should repurpose this. Once that's done the rest is easy.

I'd prefer #2 - but that's a MobileFrontend bug, so we should repurpose this. Once that's done the rest is easy.

Could you explain this preference? What does #2 get us that #1 doesn't and vice versa.

Could you explain this preference? What does #2 get us that #1 doesn't and vice versa.

They both get us the same thing. However because of the way MCS is architected, if we were to do #1 we'd be adding an additional mediawiki api query to all apps endpoints which would delay the first response time of ALL the uncached MCS endpoints that our apps friends use as well as putting an additional strain on the MediaWiki layer.

In an ideal world, MCS wouldn't use MobileViewAPI at all, but that's a lot more work...

They both get us the same thing. However because of the way MCS is architected, if we were to do #1 we'd be adding an additional mediawiki api query to all apps endpoints which would delay the first response time of ALL the uncached MCS endpoints that our apps friends use as well as putting an additional strain on the MediaWiki layer.

The corollary is that opting for #2 (as you've written it) increases the complexity of the mobileview API and increases MCS's binding to it. These costs are harder to reason about than increased load on the MediaWiki API due to an additional request being required.

Also, the mobileview and disambiguator API requests could be done in parallel. I think the strongest claim that we can make is that the first response time of the Page Summary API might be increased.

It turns out that the mobileview API already has a mechanism for requesting page properties and mixing them into the response. See T171065#3455913 for detail.

You're right. Turns out the MCS was sending ppprop instead of pageprops. T151241 is to be deployed but will make this possible.

Change 370754 had a related patch set uploaded (by Jdlrobson; owner: Jdlrobson):
[mediawiki/services/mobileapps@master] POC: Send all the links for disambiguation pages in the summary

https://gerrit.wikimedia.org/r/370754

Change 370754 abandoned by Jdlrobson:
POC: Send all the links for disambiguation pages in the summary

https://gerrit.wikimedia.org/r/370754

Change 391746 merged by jenkins-bot:
[mediawiki/services/mobileapps@master] Summary 2.0: Mark disambiguation pages

https://gerrit.wikimedia.org/r/391746

@bearND https://en.wikipedia.org/api/rest_v1/page/summary/John_Smith is not showing the disambiguation information. Is this a cached summary?

Yes, that's a cached RESTBase summary.

wmf1256:ReadingLists mholloway$ curl -sI https://en.wikipedia.org/api/rest_v1/page/summary/John_Smith | grep ^content-type
content-type: application/json; charset=utf-8; profile="https://www.mediawiki.org/wiki/Specs/Summary/1.2.0"

Yes, this one is still at 1.2.0.

curl -sI "https://en.wikipedia.org/api/rest_v1/page/summary/John_Smith" | grep Spec
content-type: application/json; charset=utf-8; profile="https://www.mediawiki.org/wiki/Specs/Summary/1.2.0"

@Pchelolo how far is the enwiki dump for summaries?

@Jdlrobson if you don't want to wait you can purge the page with ?action=purge.

Jdlrobson claimed this task.

Purged https://en.wikipedia.org/api/rest_v1/page/summary/John_Smith and can now see the ""type": "disambiguation" field.

(fyi doesn't look like summary endpoint allows revisions as parameters? Is that by design?)

The MCS version handles the revision for summaries but RESTBase does not expose it. Not sure why. If I had to guess it may have to do with storage overhead. From what I remember is that there wasn't a use case for it.

(fyi doesn't look like summary endpoint allows revisions as parameters? Is that by design?)

MCS can handle revision parameters, but it looks like RB is only exposing the title param: https://github.com/wikimedia/restbase/blob/master/v1/summary_new.yaml#L15

Not sure why that is exactly.

From what I remember is that there wasn't a use case for it.

Exactly. We couldn't come up with the use-case for that.