Page MenuHomePhabricator

Previews must display useful text for the majority of articles
Closed, DuplicatePublic

Description

Background

In T113094: [EPIC] The Page Summary API needs to provide useful content for the majority of articles, many issues were identified with previews not rendering appropriate text for articles. The majority of these issues are identified below. Any issues not identified here will be pushed to a future iteration.

Proposal

The proposal has the following goals:

  • Keep the extracts API query module as generic as possible.
  • Contain Page Previews knowledge within its extension (see above).
  • Support third-party MediaWiki installations.
    • Unless the PO/TL decide we're not going to do this and let folk

We create a query module within the Page Previews extension, pp_extract, which can produce an extract that satisfies the requirements above. It:

  • Will defer to ExtractFormatter, provided by the TextExtracts extension, to do extract selection and content filtering.
    • i.e. it'll filter out everything that doesn't match AC #1, #2, #4, and #5.
  • Will not accept any parameters.
  • Will only operate on one page – the Page Previews API request is only ever for one page.

Tying this into RESTBase should be as trivial as changing 'extracts' to 'pp_extract' and removing the ex-prefix query parameters.

Plan (YMMV)

  • Extract ApiQueryExtracts#getFirstSection( $text, $isPlainText ) to TextExtracts\Extractor#getFirstSection( $html ).
    • It might be worth extracting ExtractFormatter::getFirstChars and ::getFirstSentences too…
  • Create PagePreviews\HtmlElementFilter, which uses ExtractFormatter with a the HTML element whitelist that satisfies the AC above.
  • Create PagePreviews\ParentheticalFilter, which satisfies AC3.
  • Create PagePreviews\ApiQueryPPExtract, which ties the above into the API

Related Objects

Event Timeline

ovasileva created this task.Dec 5 2016, 3:39 PM
ovasileva reopened this task as Open.Dec 5 2016, 4:16 PM

@phuedx - this one can stand for creating new endpoint, could you add some criteria?

phuedx updated the task description. (Show Details)Dec 15 2016, 3:19 PM
phuedx updated the task description. (Show Details)
phuedx renamed this task from Hovercards must display useful text for the majority of articles to Previews must display useful text for the majority of articles.Dec 15 2016, 3:21 PM
MaxSem removed a subscriber: MaxSem.Dec 19 2016, 11:35 PM

I don't understand why so many tasks were merged into this task.

@MZMcBride - the acceptance criteria of all the merged tasks was folded into the criteria for this one (see description above)

ovasileva updated the task description. (Show Details)Jan 30 2017, 3:32 PM
Jdlrobson moved this task from Upcoming to Epics/Goals on the Readers-Web-Backlog board.
Jdlrobson changed the task status from Open to Stalled.May 1 2017, 2:50 PM

I'd like to fold this into T113094 and subtasks.
IMO we should not be using TextExtracts at all.

Can you update the description of T113094 to match this, and merge this task into it, so it inherits all the subtasks?

@Jdlrobson - I think this task was created as a spike with the goal of scoping out the work for T113094: [EPIC] The Page Summary API needs to provide useful content for the majority of articles @phuedx - correct me if I'm wrong.

phuedx updated the task description. (Show Details)May 4 2017, 3:09 PM

It's confusing me having two and having the conversation fragmented in two places. Can we merge them? It's on my plate to facilitate a conversation here and work out how we make a start with this.

phuedx added a comment.EditedMay 4 2017, 3:15 PM

This task is actionable whereas the epic isn't so I think that the relationship between this and T113094: [EPIC] The Page Summary API needs to provide useful content for the majority of articles is correct.

It's confusing me having two and having the conversation fragmented in two places. Can we merge them? It's on my plate to facilitate a conversation here and work out how we make a start with this.

With the above in mind, what's stopping the Reading Web team from working on this task?

Jdlrobson added a comment.EditedMay 4 2017, 3:29 PM

This is tagged epic. It looks very big right now and in need of some serious breaking down.

I also don't see much sense in only focusing on a subset of the problems. How did we come to this subset of problems? What if a solution is incompatible with solving one of the other problems?

Is it a spike or an epic? Please clarify.

Olga with regards to blockers we need to talk to various stakeholders especially apps and I would like us to be more clear on the ultimate end goal specifically whether we lean more on rest infrastructure or improve existing codebase.

phuedx removed a project: Epic.May 4 2017, 3:31 PM

@Jdlrobson, @phuedx - at the page previews sync meeting today, we decided that we will use next week's meeting to discuss this with other stakeholders and come to a decision on which approach to use. The problems identified above are a part of the product requirements for previews - I think this list can expand to include other bugs we may have encountered, but I think any solution we come up with should account for the requirements identified at the minimum. If necessary, we can identify which of these would not fit the proposed solution and create separate tasks to account for these specifically.

ovasileva changed the task status from Stalled to Open.May 8 2017, 1:00 PM
ovasileva moved this task from Needs Prioritization to Upcoming on the Readers-Web-Backlog board.

@phuedx to set up a meeting with devs to discuss how to approach this work (breakup, not breakup, etc)

JKatzWMF removed a subscriber: JKatzWMF.May 9 2017, 7:28 PM

I've requested that @Jdlrobson and I discuss this at Thursday's (11th May) "Page Previews sync" meeting.