Page MenuHomePhabricator

Summaries should include display title
Closed, ResolvedPublic

Description

So titles with italics and altered capitalization can be properly rendered.

Examples:
https://en.wikipedia.org/api/rest_v1/page/summary/IOS
would have "displaytitle":"iOS"

and
https://en.wikipedia.org/api/rest_v1/page/summary/Donnie_Darko
would have "displaytitle":"<i>Donnie Darko</i>" (or some other representation of italics)

This is currently handled by using the MediaWiki API

Event Timeline

I think the main options for getting this information are:

  • Query the PHP API, possibly by extending the textextract query.
  • Match displaytitle markup in Parsoid output. Afaik we currently don't need to touch Parsoid HTML for summary responses, so this would mainly be attractive once extracts are derived from Parsoid HTML.

So, we're already making the call to the query info endpoint for constructing the summaries, so adding the displaytitle property is easy.

However, I have one concern here: we have quite a few properties related to the title and it's getting confusing. Not sure we can resolve it thought, but lemme document that:

  1. The summary endpoint result has title property that's the page title with spaces, not underscores.
  2. The feed response got title that uses underscores, and normalisedtitle that's got spaces.
  3. Trending endpoint got title that uses underscores, and normalisedtitle that's got spaces.
  4. Same for the on_this_day endpoint.

So it seems like only the summary endpoint itself uses the title as the display title not the normalised article key.

When we add the displaytitle property it will be hydrated into all of these endpoints, so my proposal is to switch all the clients to it, deprecate the normalisedtitle and try to change the summary title to use underscores and be like the article db key. Thoughts?

@bearND thoughts here on how to migrate?

@Niedzielski maybe some input here as well?

Seems like we have a bit of a naming convention issue we should straighten out.

Created a PR to add the data anyway: https://github.com/wikimedia/restbase/pull/810 but it would be nice to get naming resolving plan before adding one more property for title.

@Pchelolo I'll just chime in:

#1 seems like the big problem here as it uses the title property differently than the others. @Pchelolo I thought all RESTBase summaries used the same keys/structure? Are we overriding something here?

In an ideal world, seems like we should have something like:

  • title: non escaped (wth underscores / without spaces)
  • normalizedtitle: escaped (without underscores / with spaces)
  • displaytitle: HTML

That way clients have all the possibilities they could need, but don't need to worry about transforming between the different possible formats (using error prone escape code - this is a HUGE source of bugs in clients).

My other concern is about removing/changing keys and the effects on the clients.

Summary:

  1. We should return all 3 title variations to support the different client use cases and avoid clients needing to make any transforms.
  2. We should version any APIs we change so older clients still work.

@Pchelolo I thought all RESTBase summaries used the same keys/structure? Are we overriding something here?

Yes. It's became like this because when the feed endpoint was first created, it didn't use summaries/hydration. And there already were Android clients that relied on this naming scheme, so we've had to add this code.

We should version any APIs we change so older clients still work.

So right now this is the only need for a backwards-incompatible change in the summary API. I propose to create a separate ticket for consolidating title naming scheme and wait for more need for backwards-incompatible changes to be made. And then do all at once while also supporting the Accept header negotiation.

I hope this isn't off topic but would it be possible to omit the namespace in normalizedtitle and displaytitle? For example:

https://commons.wikimedia.org/api/rest_v1/page/summary/File%3ACollage_of_Nine_Dogs.jpg

Current:
{
  ...,
  "title": "File:Collage of Nine Dogs.jpg",
  ...
}

Proposed:
{
  ...,
  "title": "File:Collage_of_Nine_Dogs.jpg", // Use this when constructing a URL
  "normalized_title": "Collage of Nine Dogs.jpg", // Use this for plain text
  "display_title": "<strong>Collage of Nine Dogs.jpg</strong>", // Use this for WebViews
  "namespace_title": "File", // Use this when you want to include the namespace in the rich or plain text title
  ...
}

Mentioned in SAL (#wikimedia-operations) [2017-05-02T17:44:40Z] <mobrovac@naos> Started deploy [restbase/deploy@6adb0f2]: Include displaytitle and page_id in the summary output and bump the content type version - T163729 T164079

Mentioned in SAL (#wikimedia-operations) [2017-05-02T17:50:44Z] <mobrovac@naos> Finished deploy [restbase/deploy@6adb0f2]: Include displaytitle and page_id in the summary output and bump the content type version - T163729 T164079 (duration: 06m 04s)

Pchelolo claimed this task.
Pchelolo edited projects, added Services (done); removed Services (next).

@Niedzielski I've created T164291 to track further refinements of the summary format, let's move the discussion there.

This task is resolved since the display title has been deployed