Page MenuHomePhabricator

[EPIC] The Page Summary API needs to provide useful content for the majority of articles
Closed, ResolvedPublic

Description

Up until now, we've mostly gotten away with using the prop=extracts MediaWiki API behind RESTBase to allow us to scale out Page Previews to a couple of large Wikipedias without issue. However, as the definition of a page summary starts to become more complicated – in the wake of the simple implementation of HTML previews in T165018: Page previews can consume new summary-HTML endpoint – and the complexity of generating extracts in the TextExtracts extension it becomes clear(er) that the extension shouldn't be the place where we house the notion of what a page summary is. Forcing this separation has the added benefit of not allowing us to conflate TextExtracts and Page Previews. We (Reading Web) readily admit that we don't know who's using the API and how they are using it.

We now have a spec for the Page Summary API. The review of the spec is tracked at T169761: Review Summary 2.0 Spec.

Plan (YMMV)

  • Create the new Page Summary API (T168848).
  • Move parenthetical stripping from the client-side to the Page Summary API.
    • Related discussion about whether to remove parenthicals or conditionally remove some: T91344.
    • Fix remaining issues with parentheticals e.g. T162219
  • check T181314 and T181316 are resolved
  • Add support for disambiguation pages via the Disambiguator extension (T168392)

There are many bugs open against TextExtracts that cause unexpected issues with the page summary we display to users. We either need to write a bunch of tests and fix up TextExtracts or build a new API specifically for the purpose of Page Previews.

There are a number of issues that

  • We may want to render inline images (see T99793)
  • Some HTML tags make sense e.g. sub and sup (T112137)
  • Parenthesises are sometimes useful and sometimes not - we need some semantic way to distinguish... (T164100, T162219). We discussed this here to a conclusion: T91344 (although kept it open but stalled for further discussion)
  • Links should get annotated with the title of the page to avoid issues with non-links showing hover cards (T75936)
  • Should not show <noinclude> content in the extract (T109869)
  • The HTML extract is not always well formed since the extract does not use a DOM parsing library (T166272)

  • See subtasks.

Related Objects

StatusSubtypeAssignedTask
ResolvedNone
ResolvedNone
Resolved Jhernandez
Resolved Mholloway
ResolvedDereckson
ResolvedJdlrobson
Resolvedovasileva
Resolvedovasileva
ResolvedJdlrobson
DuplicateNone
DuplicateNone
Resolvedovasileva
DeclinedJdlrobson
ResolvedJdlrobson
Resolvedovasileva
Resolvedphuedx
Resolvedphuedx
DuplicateNone
ResolvedJdlrobson
ResolvedJdlrobson
DuplicateNone
Duplicateovasileva
Resolvedovasileva
DuplicateNone
DeclinedNone
DuplicateJdlrobson
ResolvedMhurd
Declined JMinor
Resolvedphuedx
Resolved Pchelolo
ResolvedJdlrobson
Declined Pchelolo
Resolvedphuedx
DeclinedJdlrobson
DuplicateNone
Resolved Fjalapeno
Resolvedphuedx
Declinedpmiazga
DeclinedNone
Resolvedphuedx
DeclinedNone
Resolved Pchelolo
Resolved bearND
Resolved Mholloway
ResolvedMSantos
Resolved Mholloway
InvalidNone
ResolvedJdlrobson
InvalidNone
DuplicateNone
ResolvedJdlrobson
ResolvedJdlrobson
ResolvedJdlrobson
ResolvedJdlrobson
Resolvedphuedx
Resolved bearND
Resolved Mholloway
DuplicateNone
ResolvedJdlrobson
ResolvedJdlrobson
Resolvedphuedx
ResolvedJdlrobson
ResolvedJdlrobson
Resolved bearND
ResolvedJdlrobson
Resolved Mholloway
Resolved Mholloway
ResolvedJdlrobson
ResolvedJdlrobson
Resolved bearND

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Quiddity added a subscriber: Zhuanru001.

Why has this been closed? Spam?

Spam.

Jdlrobson removed a project: User-Jdlrobson.

I believe this can be resolved now @ovasileva
Remaining issues can be addressed via bug fixing.
Note there is one single open sub task: T170617 which would be nice to get done sooner rather than later, but we don't need to track this work under the epic.

All looks good, changes have been documented and communicated, and subtasks are resolved. Closing this. Good job everyone.

@ovasileva: I tried to find documentation about this new API, but wasn't successful. Searching for "Page Summary API" on MediaWiki.org lead me to:

I also looked at Page Previews,
Extension:TextExtracts, and the main MediaWiki API documentation, but didn't see anything useful. Am I just overlooking it?

Thanks, I added some helpful links: 1,
2, 3