Page MenuHomePhabricator

Make the Page Summary API return an "intro" for a page
Closed, DuplicatePublic

Description

Background

See T168941: Remove spacing between list items for examples of how lists should be rendered in the Page Previews client. Currently by limiting ourselves to sentences/characters we risk losing list HTML which we would like to retain.

AC

  • The Page Summary API still returns content from the first p element of the lead section of the page.
  • The API is extended to return:
    • Any ol, ul or dl that are immediate siblings of the first p element.
  • This change is done in the Page Summary API (T168848).

Signoff AC

Related Objects

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 22 2017, 1:04 PM
ovasileva triaged this task as High priority.
ovasileva updated the task description. (Show Details)
ovasileva added a subscriber: phuedx.
ovasileva updated the task description. (Show Details)Jun 22 2017, 1:07 PM

What about lists?

@phuedx: we'll be including lists - see "note"

This will need input from Design:

  • What's the spacing between the paragraph and the first list item?
  • What's the spacing between items?
  • Do we truncate long lists? If so, then how? What's "a long list"? (Currently, about two items with a small amount of text)

@phuedx - they would be truncated based on the length of the extract right?
@Nirzar - here's what they look like now...

phuedx updated the task description. (Show Details)Jun 22 2017, 3:30 PM
phuedx updated the task description. (Show Details)Jun 22 2017, 3:35 PM
Jdlrobson changed the task status from Open to Stalled.
Jdlrobson added a subscriber: Jdlrobson.

Blocked on https://phabricator.wikimedia.org/T144622 IMO. This will allow us a generic way to extract the first paragraph and will be useful for other devs.

Jdlrobson lowered the priority of this task from High to Normal.Jun 22 2017, 5:47 PM

As discussed during the standup/Watercooler meetings, I think that the exparagraphs parameter would probably be a good addition to TextExtracts but not in order to solve this problem. Per the description, this not only requires the first p but every ul, ol, and dl up to the next p. That ain't the first paragraph. Boo.

ovasileva renamed this task from Restrict page previews to display first paragraph of article only to Restrict page previews to display lead section of article only.Jun 23 2017, 12:11 PM
ovasileva updated the task description. (Show Details)

@phuedx, @bmansurov - to remove the current cutoff for lists, can we implement something similar to T168332#3369896? As in, display the list, but remove the spacing between the beginning of the list and the first list item? For example, in

can we remove the spacing between "...stellar remnant that" and "is massive enough" ?

Sure. Is there a task for that? Or do we want to do it as part of this task?

@bmansurov - I suppose it would make more sense to re-open T168332: HTML previews' layout breaks text multi-line text truncation and do it there?

That task is pretty specific to multiple paragraphs. To reduce confusion, it maybe best to create a new task and point to the task you pasted. Or we may have to update the task's description to include lists with screenshots. Your call.

@bmansurov - I suppose it would make more sense to re-open T168332: HTML previews' layout breaks text multi-line text truncation and do it there?

There's the "Missing Examples" section in the description of this task, which is awaiting examples and specifications for how we'll deal with lists in the lead section.

phuedx updated the task description. (Show Details)Jun 27 2017, 8:14 AM
phuedx updated the task description. (Show Details)
ovasileva updated the task description. (Show Details)Jun 27 2017, 12:01 PM
phuedx renamed this task from Restrict page previews to display lead section of article only to Make page summary API return the lead section of a page.Jun 27 2017, 1:46 PM
phuedx updated the task description. (Show Details)
phuedx renamed this task from Make page summary API return the lead section of a page to Make page summary API return content from the lead section of a page.

Make page summary API return content from the lead section of a page

@phuedx The title is a little bit confusing. The exintro parameter is documented as "Return only content before the first section."

https://en.m.wikipedia.beta.wmflabs.org/wiki/Special:ApiSandbox#action=query&format=json&prop=extracts&titles=Albert+Einstein&exintro=1&explaintext=1

I think actually what you are asking for is for the lead paragraph (not section)?

NOTE: Unrelated: The MobileFormatter doesn't move dl elements. Should it?

Let's apply YAGNI and ignore this problem until we find a real world example/bug that we need to fix.

Jdlrobson renamed this task from Make page summary API return content from the lead section of a page to Make page summary API return content from the lead paragraph of a page.Jun 27 2017, 8:25 PM
Jdlrobson added a subscriber: Sophivorus.
Jhernandez raised the priority of this task from Normal to High.Jun 28 2017, 9:14 AM
Jhernandez added a subscriber: Jhernandez.

T144622: Add exparagraphs parameter to API has been declined in favor of this task. That one was High prio, so I'm setting this one as High. Feel free to back it down if it isn't.

phuedx renamed this task from Make page summary API return content from the lead paragraph of a page to Make page summary API return whitelisted content from the lead section of a page.Jun 28 2017, 9:39 AM
phuedx updated the task description. (Show Details)

Make page summary API return content from the lead section of a page

@phuedx The title is a little bit confusing. The exintro parameter is documented as "Return only content before the first section."

I think actually what you are asking for is for the lead paragraph (not section)?

Not quite. This task is about whitelisting list elements in the lead section returned by the Page Summary API – as opposed to the blacklisting approach that's taken in TextExtracts.

I'll admit that until Thursday, 29th we remain unresolved as to where to put this logic. However, I'm convinced that we should think of the Page Summary API as distinct from TextExtracts as it's very specific to Page Previews. I'd like to think that it'll eventually supersede TextExtracts but that's not a primary goal right now.

phuedx renamed this task from Make page summary API return whitelisted content from the lead section of a page to Make the Page Summary API return lists in the lead section of a page.
phuedx updated the task description. (Show Details)
Jhernandez updated the task description. (Show Details)Jun 28 2017, 10:17 AM

Ping @ovasileva, you have a question in the description.

phuedx updated the task description. (Show Details)Jun 28 2017, 10:22 AM
ovasileva added a comment.EditedJun 28 2017, 11:57 AM

@Jhernandez - yup, we should include description lists as well. It seems we overlooked them, now T169062: Reconsider definition of "lead paragraph" on mobile view

On that note, do we know if there are any other text elements (outside of dl, ul, ol lists) that may appear between a first paragraph and the second?

@ovasileva Any html can be there theoretically. Practically probably not much more.

Jhernandez updated the task description. (Show Details)Jun 28 2017, 12:46 PM
Jhernandez updated the task description. (Show Details)

Not quite. This task is about whitelisting list elements in the lead section returned by the Page Summary API – as opposed to the blacklisting approach that's taken in TextExtracts.

This is a little dangerous if we do it in TextExtracts as a lot of gadget developers use the text extracts API and might be expecting the whole lead section not something that resembles what we define in MobileFrontend and the mobile content service as the lead paragraph.

FWIW the current definition as defined in the description matches the MCS lead paragraph definition.
Personally I would advise adding an exintroparagraph parameter and do this inside TextExtracts.

Let's think about this carefully.

Happy to talk about this after a standup..

This is a little dangerous if we do it in TextExtracts as a lot of gadget developers use the text extracts API and might be expecting the whole lead section not something that resembles what we define in MobileFrontend and the mobile content service as the lead paragraph.

I agree. This task isn't about adding this feature to TextExtracts, e.g. see the third AC:

This change is done in the Page Summary API.


FWIW the current definition as defined in the description matches the MCS lead paragraph definition.

Great. We can discuss whether MCS also fulfils other requirements of the Page Summary API.

Personally I would advise adding an exintroparagraph parameter and do this inside TextExtracts.

Let's think about this carefully.

I'm not sure that I follow this, given the first line of your response ("This is a little dangerous if we do it in TextExtracts"). Nevertheless, I'm happy to talk about this in standup and try to summarise what we said too.

This is a little dangerous if we do it in TextExtracts as a lot of gadget developers use the text extracts API and might be expecting the whole lead section not something that resembles what we define in MobileFrontend and the mobile content service as the lead paragraph.

I agree. This task isn't about adding this feature to TextExtracts, e.g. see the third AC:

This is what I'm challenging. This is a requested feature for TextExtracts too and would be useful to editors (see T144622)...

This change is done in the Page Summary API.

... So I'm advising we don't just do this here...

Let's chat :)

We chatted. We are clear on the definition now - we will refer to it as "intro".
Whether done in TextExtracts or service is not completely decided, but leaning towards latter. Hopefully that will support 3rd party use cases as well.

phuedx updated the task description. (Show Details)Jun 29 2017, 9:01 AM
Jdlrobson renamed this task from Make the Page Summary API return lists in the lead section of a page to Make the Page Summary API return an "intro" for a page.Jun 29 2017, 8:05 PM
Jdlrobson updated the task description. (Show Details)
ovasileva moved this task from Backlog to Next Up on the Page-Previews board.Jul 5 2017, 12:37 PM