Page MenuHomePhabricator

Make the Page Summary API return an "intro" for a page
Closed, DuplicatePublic

Description

Background

See T168941: Remove spacing between list items for examples of how lists should be rendered in the Page Previews client. Currently by limiting ourselves to sentences/characters we risk losing list HTML which we would like to retain.

AC

  • The Page Summary API still returns content from the first p element of the lead section of the page.
  • The API is extended to return:
    • Any ol, ul or dl that are immediate siblings of the first p element.
  • This change is done in the Page Summary API (T168848).

Signoff AC

Related Objects

Event Timeline

ovasileva added a project: Page-Previews.
ovasileva updated the task description. (Show Details)
ovasileva added a subscriber: phuedx.

@phuedx: we'll be including lists - see "note"

This will need input from Design:

  • What's the spacing between the paragraph and the first list item?
  • What's the spacing between items?
  • Do we truncate long lists? If so, then how? What's "a long list"? (Currently, about two items with a small amount of text)

@phuedx - they would be truncated based on the length of the extract right?
@Nirzar - here's what they look like now...

Screen Shot 2017-06-22 at 4.41.32 PM.png (278×361 px, 60 KB)

Jdlrobson changed the task status from Open to Stalled.Jun 22 2017, 3:50 PM
Jdlrobson subscribed.

Blocked on https://phabricator.wikimedia.org/T144622 IMO. This will allow us a generic way to extract the first paragraph and will be useful for other devs.

Jdlrobson lowered the priority of this task from High to Medium.Jun 22 2017, 5:47 PM

As discussed during the standup/Watercooler meetings, I think that the exparagraphs parameter would probably be a good addition to TextExtracts but not in order to solve this problem. Per the description, this not only requires the first p but every ul, ol, and dl up to the next p. That ain't the first paragraph. Boo.

ovasileva renamed this task from Restrict page previews to display first paragraph of article only to Restrict page previews to display lead section of article only.Jun 23 2017, 12:11 PM
ovasileva updated the task description. (Show Details)

@phuedx, @bmansurov - to remove the current cutoff for lists, can we implement something similar to T168332#3369896? As in, display the list, but remove the spacing between the beginning of the list and the first list item? For example, in

Screen Shot 2017-06-22 at 4.41.32 PM.png (278×361 px, 60 KB)
can we remove the spacing between "...stellar remnant that" and "is massive enough" ?

Sure. Is there a task for that? Or do we want to do it as part of this task?

That task is pretty specific to multiple paragraphs. To reduce confusion, it maybe best to create a new task and point to the task you pasted. Or we may have to update the task's description to include lists with screenshots. Your call.

@bmansurov - I suppose it would make more sense to re-open T168332: HTML previews' layout breaks text multi-line text truncation and do it there?

There's the "Missing Examples" section in the description of this task, which is awaiting examples and specifications for how we'll deal with lists in the lead section.

phuedx updated the task description. (Show Details)
phuedx renamed this task from Restrict page previews to display lead section of article only to Make page summary API return the lead section of a page.Jun 27 2017, 1:46 PM
phuedx renamed this task from Make page summary API return the lead section of a page to Make page summary API return content from the lead section of a page.
phuedx updated the task description. (Show Details)

Make page summary API return content from the lead section of a page

@phuedx The title is a little bit confusing. The exintro parameter is documented as "Return only content before the first section."

https://en.m.wikipedia.beta.wmflabs.org/wiki/Special:ApiSandbox#action=query&format=json&prop=extracts&titles=Albert+Einstein&exintro=1&explaintext=1

I think actually what you are asking for is for the lead paragraph (not section)?

NOTE: Unrelated: The MobileFormatter doesn't move dl elements. Should it?

Let's apply YAGNI and ignore this problem until we find a real world example/bug that we need to fix.

Jdlrobson renamed this task from Make page summary API return content from the lead section of a page to Make page summary API return content from the lead paragraph of a page.Jun 27 2017, 8:25 PM
Jdlrobson added a subscriber: Sophivorus.
Jhernandez raised the priority of this task from Medium to High.Jun 28 2017, 9:14 AM
Jhernandez subscribed.

T144622: Add exparagraphs parameter to API has been declined in favor of this task. That one was High prio, so I'm setting this one as High. Feel free to back it down if it isn't.

phuedx renamed this task from Make page summary API return content from the lead paragraph of a page to Make page summary API return whitelisted content from the lead section of a page.Jun 28 2017, 9:39 AM
phuedx updated the task description. (Show Details)

Make page summary API return content from the lead section of a page

@phuedx The title is a little bit confusing. The exintro parameter is documented as "Return only content before the first section."

I think actually what you are asking for is for the lead paragraph (not section)?

Not quite. This task is about whitelisting list elements in the lead section returned by the Page Summary API – as opposed to the blacklisting approach that's taken in TextExtracts.

I'll admit that until Thursday, 29th we remain unresolved as to where to put this logic. However, I'm convinced that we should think of the Page Summary API as distinct from TextExtracts as it's very specific to Page Previews. I'd like to think that it'll eventually supersede TextExtracts but that's not a primary goal right now.

phuedx renamed this task from Make page summary API return whitelisted content from the lead section of a page to Make the Page Summary API return lists in the lead section of a page.Jun 28 2017, 9:57 AM
phuedx updated the task description. (Show Details)

Ping @ovasileva, you have a question in the description.

@Jhernandez - yup, we should include description lists as well. It seems we overlooked them, now T169062: Reconsider definition of "lead paragraph" on mobile view

On that note, do we know if there are any other text elements (outside of dl, ul, ol lists) that may appear between a first paragraph and the second?

@ovasileva Any html can be there theoretically. Practically probably not much more.

Not quite. This task is about whitelisting list elements in the lead section returned by the Page Summary API – as opposed to the blacklisting approach that's taken in TextExtracts.

This is a little dangerous if we do it in TextExtracts as a lot of gadget developers use the text extracts API and might be expecting the whole lead section not something that resembles what we define in MobileFrontend and the mobile content service as the lead paragraph.

FWIW the current definition as defined in the description matches the MCS lead paragraph definition.
Personally I would advise adding an exintroparagraph parameter and do this inside TextExtracts.

Let's think about this carefully.

Happy to talk about this after a standup..

This is a little dangerous if we do it in TextExtracts as a lot of gadget developers use the text extracts API and might be expecting the whole lead section not something that resembles what we define in MobileFrontend and the mobile content service as the lead paragraph.

I agree. This task isn't about adding this feature to TextExtracts, e.g. see the third AC:

This change is done in the Page Summary API.


FWIW the current definition as defined in the description matches the MCS lead paragraph definition.

Great. We can discuss whether MCS also fulfils other requirements of the Page Summary API.

Personally I would advise adding an exintroparagraph parameter and do this inside TextExtracts.

Let's think about this carefully.

I'm not sure that I follow this, given the first line of your response ("This is a little dangerous if we do it in TextExtracts"). Nevertheless, I'm happy to talk about this in standup and try to summarise what we said too.

This is a little dangerous if we do it in TextExtracts as a lot of gadget developers use the text extracts API and might be expecting the whole lead section not something that resembles what we define in MobileFrontend and the mobile content service as the lead paragraph.

I agree. This task isn't about adding this feature to TextExtracts, e.g. see the third AC:

This is what I'm challenging. This is a requested feature for TextExtracts too and would be useful to editors (see T144622)...

This change is done in the Page Summary API.

... So I'm advising we don't just do this here...

Let's chat :)

We chatted. We are clear on the definition now - we will refer to it as "intro".
Whether done in TextExtracts or service is not completely decided, but leaning towards latter. Hopefully that will support 3rd party use cases as well.

Jdlrobson renamed this task from Make the Page Summary API return lists in the lead section of a page to Make the Page Summary API return an "intro" for a page.Jun 29 2017, 8:05 PM
Jdlrobson updated the task description. (Show Details)