[EPIC] Add notion of emptiness to page summary API
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	phuedx
	Jun 20 2017, 1:48 PM

Description

There are a couple of instances where we'd like to signal that a page doesn't have a summary or that the summary is empty:

When the page is a disambiguation page (T168391)

We can and should use the Disambiguator extension's API to detect whether or not the page is a disambiguation page.

When the last sentence from the page's introductory paragraph ends with a ":" (T168328)

Currently, we test whether the extract is empty after some processing is done on the client. This work can and should be done in the page summary API.

Caveats

We've deferred the decision to pick either the MediaWiki API or RESTBase as the place that we build the page summary API. We need to decide where we're going to move this logic and own that architectural decision.

Related Objects
Search...

Status	Assigned	Task
Resolved	None	T169242 Develop Page Content Service for Reading Clients
Resolved	None	T177425 Develop General Layer of PCS
Resolved	• Jhernandez	T177426 Develop structured JSON APIs for general consumption
Resolved	• Mholloway	T177431 Develop a Summary JSON API
Resolved	Dereckson	T68374 Enable Hovercards on se.wikimedia.org (Swedish chapter wiki)
Resolved	Jdlrobson	T70860 [GOAL] Graduate Page Previews feature (Popups extension) out of Beta Feature
Resolved	ovasileva	T154635 [EPIC] Deploy page previews to English and German Wikipedia
Resolved	ovasileva	T192622 [EPIC] Page previews post-deploy cleanup
Resolved	Jdlrobson	T173952 Remove A/B testing instrumentation code
Duplicate	None	T167433 Switch all projects to the new (and yet to be built) summary-html endpoint for page previews
Duplicate	None	T167429 Make enwiki and dewiki fetch previews from the summary-html RESTBase endpoint
Resolved	ovasileva	T165018 Page previews can consume new summary-HTML endpoint
Declined	Jdlrobson	T111329 [GOAL] Page previews on mobileweb
Resolved	Jdlrobson	T164010 [EPIC] Strengthen the APIs we provide in reading web maintained extensions
Resolved	ovasileva	T113094 [EPIC] The Page Summary API needs to provide useful content for the majority of articles
Resolved	• Fjalapeno	T167022 Inventory requirements for summary 2.0 endpoint across Reading teams
Resolved	phuedx	T168400 [EPIC] Add notion of emptiness to page summary API
Declined	pmiazga	T168328 Extracts for Wikimedia List articles display partial previews
Declined	None	T168391 Page previews should display generic preview for disambiguation pages
Resolved	phuedx	T114418 Disable TextExtracts on file pages
Declined	None	T168418 Page Previews should accept the service's notion of emptiness

Event Timeline

phuedx created this task.Jun 20 2017, 1:48 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 20 2017, 1:48 PM

phuedx added a project: Services.Jun 20 2017, 1:50 PM

phuedx updated the task description. (Show Details)

phuedx added a subtask: T168328: Extracts for Wikimedia List articles display partial previews.

phuedx added a subtask: T168391: Page previews should display generic preview for disambiguation pages.Jun 20 2017, 1:52 PM

phuedx updated the task description. (Show Details)Jun 20 2017, 1:56 PM

Jdlrobson added a subtask: T114418: Disable TextExtracts on file pages.Jun 20 2017, 4:08 PM

Jdlrobson closed subtask T168391: Page previews should display generic preview for disambiguation pages as Declined.Jun 20 2017, 4:16 PM

phuedx added a subtask: T168418: Page Previews should accept the service's notion of emptiness.Jun 20 2017, 4:24 PM

Aklapper added a project: Epic.Jun 20 2017, 4:26 PM

We've deferred the decision to pick either the MediaWiki API or RESTBase as the place that we build the page summary API. We need to decide where we're going to move this logic and own that architectural decision.

This is a point that needs to be determined a priori, IMHO. I think the question is whether the TextExtracts should have the notion of what the page summary is or not. If yes, then this logic belongs there. If not, it should be transferred to RESTBase. Alas, I am not convinced that is a good place for that because RESTBase has no notion of disambiguation or list pages (just as it doesn't have the notion of user pages, talk pages, etc), and given its purpose it probably shouldn't. Furthermore, placing that logic in RESTBase would mean that the Disambiguation extension becomes a required component on which RESTBase depends. Conversely, if done in the TextExtracts, it could check whether the former extension is installed or not before consulting its API.

If we want to return empty extract for the disambiguation page, TextExtracts is definitely a place for that.

If we want to add a is_disambiguation bool to the summary JSON - then it's RESTBase.

Personally I think the former is better.

Jdlrobson triaged this task as Medium priority.Jun 21 2017, 9:12 PM

Jdlrobson moved this task from Incoming to Epics/Goals on the Web-Team-Backlog board.

In T168400#3364104, @mobrovac wrote:

I think the question is whether the TextExtracts should have the notion of what the page summary is or not. If yes, then this logic belongs there. If not, it should be transferred to RESTBase.

The definition of what a page summary is or isn't shouldn't belong in TextExtracts. TextExtracts is a general purpose API for getting the first N characters/sentences of a page, with an unknown number third-party clients. We should consider TextExtracts and page summaries distinct.

Alas, I am not convinced that is a good place for that because RESTBase has no notion of disambiguation or list pages (just as it doesn't have the notion of user pages, talk pages, etc), and given its purpose it probably shouldn't. Furthermore, placing that logic in RESTBase would mean that the Disambiguation extension becomes a required component on which RESTBase depends. Conversely, if done in the TextExtracts, it could check whether the former extension is installed or not before consulting its API.

The Disambiguator extension provides a MediaWiki API query module that we can use as part of the internal queries that RESTBase makes when generating page summaries. If the extension isn't installed you'll see a warning in the warning property of the MediaWiki API response. Since this is a soft failure, RESTBase would only be loosely coupled to the Disambiguator extension.

As for list pages… neither the MediaWiki API nor the MediaWiki internals have any notion of list pages either AFAIK. This is a special case that we're going to have to resolve via a hack and that hack could live anywhere.

Now, how should the Reading Web team be using RESTBase? Can we move business logic into services, like Mobile-Content-Service for example, or should we only be thinking of RESTBase as a distributed cache that acts RESTfully? If it's the latter, then the Reading Web team will undertake creating a new MediaWiki API query module which'll both generate page summaries and be our definition for what a page summary is. We'll then use RESTBase to help us achieve high volume access.

We are doing pretty good on the performance side of things: more than 95% of external requests (most of which are PP nowadays) never even reach RESTBase; rather, they are served directly from Varnish. This means that we have the flexibility of deciding where and how to do it. I see several options:

Implement the logic in TextExtracts by supplying it an extra parameter that activates the business logic
Implement a new MW API module (possibly in a new extension)
Do the business logic in RESTBase
Implement it in MCS

Out of these, the last one is the most flexible as it allows you to freely tweak the results and fix any issues and get them out quickly. Moreover, MCS is fully WM-centred which is conceptually makes it the canonical place for implementing WP-specific logic. Option 1 feels like a hack, because as you point out, TextExtracts should be generic and work for third parties as well. While the second option is a viable way too, I personally am not in favour of it as I don't think that the proliferation of extensions is a good thing. Also of note is that choosing any of the first two options means that you are dependant on the train and SWAT slots to get your changes out. Finally, implementing it in RESTBase might be the quickest fix now, but this comes with caveats for the future. Most notably, depending on the services team for changes/reviews/merges/deploys which is not sustainable in the long run if you are to own the end point's output. We want to give you the flexibility to iterate and make changes as you see fit. We also don't want to replace one monolith (MW) with another (RB) :P

I'd opt for 4-ish: build a new service so that we get the benefits of 4 – not depending on Services (seriously though, y'all are awesome).

Jdlrobson changed the status of subtask T168328: Extracts for Wikimedia List articles display partial previews from Open to Stalled.Jun 22 2017, 5:55 PM

Indeed, I knew I had 5 options in mind when I started typing my reply, but forgot one along the way :) That is also an attractive option, but it would be good to see with the rest of the Reading Services team, as they are planning on renaming/repurposing MCS, cf. T157059: Rename Mobile Content Service?

• Mholloway subscribed.Jun 29 2017, 4:23 PM

• Fjalapeno added a parent task: T167022: Inventory requirements for summary 2.0 endpoint across Reading teams.Jun 29 2017, 5:13 PM

ovasileva moved this task from Backlog to Next Up on the Page-Previews board.Jul 7 2017, 1:59 PM

phuedx closed subtask T114418: Disable TextExtracts on file pages as Resolved.Jul 11 2017, 1:32 PM

Jdlrobson closed subtask T168328: Extracts for Wikimedia List articles display partial previews as Declined.Jul 13 2017, 6:40 PM

phuedx closed subtask T168418: Page Previews should accept the service's notion of emptiness as Declined.Jul 14 2017, 9:36 AM

These requirements were folded into the Page Preview API spec some time ago, the review of which is tracked in T169761: Review Summary 2.0 Spec.

For the record, we've also opted to implement this and a lightweight version of TextExtracts as routes in Mobile-Content-Service.

MBinder_WMF moved this task from Epics/Goals to 2017-18 Q1 on the Web-Team-Backlog board.Mar 6 2018, 6:53 PM

[EPIC] Add notion of emptiness to page summary APIClosed, ResolvedPublicActions

Description

Caveats

Related ObjectsSearch...

Event Timeline

[EPIC] Add notion of emptiness to page summary API
Closed, ResolvedPublic
Actions

Related Objects
Search...