Page MenuHomePhabricator

Page Previews should accept the service's notion of emptiness
Closed, DeclinedPublic

Description

Background

We would like to show a different type of preview for pages where it is not possible to get extracts. A generic special preview "We cannot show the preview" should be shown when we hit these pages.

At the moment, Page Previews processes an extract that it gets from the service (MediaWiki API and RESTBase endpoint). It then tests whether the extract looks empty-ish in client side JavaScript. When the service can indicate that a preview is empty, then these tests should be removed from the codebase/PP should only rely on the service.

Related: T168392

Implementation

Possible options:

  • We should 404 if the extract is empty
  • We should omit the extracts field
  • We should provide an empty string

Definition of empty

  • If an extract field is an empty string and extract_html is also empty
  • extract_html is empty when rendered has no textContent

Closed Questions

  • Which implementation?

@phuedx: The summary-html route in MCS (T168848).

  • Which service?

@phuedx: The Page Summary API (see above and the spec.

  • Should we move extract processing to the server before tackling this? Or is it a part of this?

@Jdlrobson: I think this is out of scope. This is about what to do when the extract is empty.

Related Objects

Event Timeline

phuedx created this task.Jun 20 2017, 4:16 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 20 2017, 4:16 PM
Jhernandez updated the task description. (Show Details)Jun 21 2017, 4:22 PM
Jdlrobson triaged this task as High priority.Jun 21 2017, 6:43 PM

@Pchelolo @mobrovac - some background - we need to do better at signaling when an extract is not available. This might be because the page is in a namespace where extracts are not possible or there is no lead section to generate one from. e.g. https://en.wikipedia.beta.wmflabs.org/api/rest_v1/page/summary/Jon%20example

It feels like 404ing is the right thing to do here - but what do you think?

It feels like 404ing is the right thing to do here - but what do you think?

Hm.. We can 404 if the extract and Wikidata description are empty, that' easy. Are we sure there's no corner case where the extract is not available while some other info might be available like a page image or coordinate? Hard to imaging such a case though..

Right now it's a summary endpoint so I think it will only ever contain summaries. The idea would be to to error if nothing is available.

Jdlrobson updated the task description. (Show Details)

extract_html is empty when rendered has no textContent

Can you give me an example when the extract_text is not empty while the textContent is? How can that happen?

Created a PR for it https://github.com/wikimedia/restbase/pull/831 but I'm not too sure if empty text extract is a good enough approximation for empty textContent

So we have already had T167045: exsentences incorrectly returns empty text extract where not having an extract crashes the Android app if the field is not present. That means that omitting the field is not an option. Throwing a 404 if the field is not present is a bit contentious I think: it is true that the extract is the centrepiece of the summary end point response, but there are other bits of information that conceptually make up what we think of a page's summary is. IMHO, returning an empty extract is the correct way to go here (the client can always display nothing). Returning a 404 should be done when either (a) the page doesn't exist; or (b) no information is truly available for that page.

phuedx added a comment.EditedJun 23 2017, 1:56 PM

Can you give me an example when the extract_text is not empty while the textContent is? How can that happen?

In the case of an HTML extract, this could be something like '<p><b>(Blank.)</b></p>' and that would be transformed to '<p><b></b></p>' after the parenthetical statement is stripped. I'm not sure this could happen with a plain text extract.

mobrovac added a comment.EditedJun 23 2017, 5:07 PM

I'm not sure this could happen with a plain text extract.

Because we have recently switched to asking for the HTML and then stripping the tags (cf T165017 for context).

ovasileva moved this task from Backlog to Next Up on the Page-Previews board.Jul 7 2017, 1:59 PM
phuedx closed this task as Declined.Jul 14 2017, 9:36 AM

I've folded the definition of emptiness into the Page Preview API spec. Making the Page Previews client request previews from the new API/handling the response will be tracked in a different card.

phuedx updated the task description. (Show Details)Jul 14 2017, 9:39 AM