Page MenuHomePhabricator

[EPIC] The Page Summary API needs to provide useful content for the majority of articles
Closed, ResolvedPublic

Description

Up until now, we've mostly gotten away with using the prop=extracts MediaWiki API behind RESTBase to allow us to scale out Page Previews to a couple of large Wikipedias without issue. However, as the definition of a page summary starts to become more complicated – in the wake of the simple implementation of HTML previews in T165018: Page previews can consume new summary-HTML endpoint – and the complexity of generating extracts in the TextExtracts extension it becomes clear(er) that the extension shouldn't be the place where we house the notion of what a page summary is. Forcing this separation has the added benefit of not allowing us to conflate TextExtracts and Page Previews. We (Reading Web) readily admit that we don't know who's using the API and how they are using it.

We now have a spec for the Page Summary API. The review of the spec is tracked at T169761: Review Summary 2.0 Spec.

Plan (YMMV)

  • Create the new Page Summary API (T168848).
  • Move parenthetical stripping from the client-side to the Page Summary API.
    • Related discussion about whether to remove parenthicals or conditionally remove some: T91344.
    • Fix remaining issues with parentheticals e.g. T162219
  • check T181314 and T181316 are resolved
  • Add support for disambiguation pages via the Disambiguator extension (T168392)

There are many bugs open against TextExtracts that cause unexpected issues with the page summary we display to users. We either need to write a bunch of tests and fix up TextExtracts or build a new API specifically for the purpose of Page Previews.

There are a number of issues that

  • We may want to render inline images (see T99793)
  • Some HTML tags make sense e.g. sub and sup (T112137)
  • Parenthesises are sometimes useful and sometimes not - we need some semantic way to distinguish... (T164100, T162219). We discussed this here to a conclusion: T91344 (although kept it open but stalled for further discussion)
  • Links should get annotated with the title of the page to avoid issues with non-links showing hover cards (T75936)
  • Should not show <noinclude> content in the extract (T109869)
  • The HTML extract is not always well formed since the extract does not use a DOM parsing library (T166272)

  • See subtasks.

Related Objects

StatusAssignedTask
OpenNone
OpenNone
ResolvedJhernandez
ResolvedMholloway
ResolvedDereckson
ResolvedJdlrobson
Resolvedovasileva
Resolvedovasileva
ResolvedJdlrobson
DuplicateNone
DuplicateNone
Resolvedovasileva
OpenJdlrobson
ResolvedJdlrobson
Resolvedovasileva
Resolvedphuedx
Resolvedphuedx
DuplicateNone
ResolvedJdlrobson
ResolvedJdlrobson
DuplicateNone
Duplicateovasileva
Resolvedovasileva
DuplicateNone
DeclinedNone
DuplicateJdlrobson
ResolvedMhurd
DeclinedJMinor
Resolvedphuedx
ResolvedPchelolo
ResolvedJdlrobson
DeclinedPchelolo
Resolvedphuedx
DeclinedJdlrobson
DuplicateNone
ResolvedFjalapeno
Resolvedphuedx
Declinedpmiazga
DeclinedNone
Resolvedphuedx
DeclinedNone
ResolvedPchelolo
ResolvedbearND
ResolvedMholloway
ResolvedMSantos
ResolvedMholloway
InvalidNone
ResolvedJdlrobson
InvalidNone
DuplicateNone
ResolvedJdlrobson
ResolvedJdlrobson
ResolvedJdlrobson
ResolvedJdlrobson
Resolvedphuedx
ResolvedbearND
ResolvedMholloway
DuplicateNone
ResolvedJdlrobson
ResolvedJdlrobson
Resolvedphuedx
ResolvedJdlrobson
ResolvedJdlrobson
ResolvedbearND
ResolvedJdlrobson
ResolvedMholloway
ResolvedMholloway
ResolvedJdlrobson
ResolvedJdlrobson
ResolvedbearND

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Why has this been closed? Spam?

Jdlrobson reopened this task as Open.Jul 31 2017, 11:08 PM
Quiddity removed Zhuanru001 as the assignee of this task.Aug 1 2017, 12:08 AM
Quiddity added a subscriber: Zhuanru001.
phuedx added a comment.Aug 1 2017, 5:14 AM

Why has this been closed? Spam?

Spam.

Dvorapa removed a subscriber: Dvorapa.Aug 25 2017, 8:42 AM
Restricted Application added a subscriber: jeblad. · View Herald TranscriptAug 25 2017, 8:42 AM
Quiddity removed a subscriber: Quiddity.Sep 7 2017, 9:49 PM
Jdlrobson moved this task from Inbox to Tracking on the User-Jdlrobson board.Sep 27 2017, 8:58 PM

tagging kanban board for goals tracking

Jdlrobson updated the task description. (Show Details)Feb 22 2018, 10:11 PM
Jdlrobson removed a project: User-Jdlrobson.

I believe this can be resolved now @ovasileva
Remaining issues can be addressed via bug fixing.
Note there is one single open sub task: T170617 which would be nice to get done sooner rather than later, but we don't need to track this work under the epic.

ovasileva closed this task as Resolved.Mar 16 2018, 1:51 PM

All looks good, changes have been documented and communicated, and subtasks are resolved. Closing this. Good job everyone.

kaldari added a comment.EditedMar 16 2018, 8:59 PM

@ovasileva: I tried to find documentation about this new API, but wasn't successful. Searching for "Page Summary API" on MediaWiki.org lead me to:

I also looked at Page Previews,
Extension:TextExtracts, and the main MediaWiki API documentation, but didn't see anything useful. Am I just overlooking it?

Thanks, I added some helpful links: 1,
2, 3