Page MenuHomePhabricator

[Regression] exclude pronunciation guides from article extracts
Closed, ResolvedPublic

Description

Simulator Screen Shot Apr 20, 2017, 4.16.08 PM.png (1×750 px, 524 KB)

Previously we stripped out the phonetic pronunciation guide when presenting extracts in the app (recommendations, featured article, etc). This information, while helpful, often takes up a lot of horizontal space and generally doensn't fit the purspose of extracts (which is to help the user determine if an article is interest/relevant by providing a short informative introduction.

I am open to leaving as is, but we should make a deliberate choice, rather than change this due to regression.

Related Objects

Event Timeline

'enwiki > Lutefisk' is a good example.

I've underlined its parenthetical and pronunciation guides which bleed through in its extract:

Screen Shot 2017-04-12 at 5.36.49 PM.png (1×862 px, 582 KB)

...would be best if this could be fixed upstream.

@Fjalapeno do you recall if we had a separate ticket for this for fixing it upstream?

@Mhurd @JMinor We are addressing some pronunciation issues in the new MCS API, but the iOS content is from the MediaWiki API.

Couple questions:

  1. Is this a regression?
  2. What is the desired behavior? It is stripped at the top? Is it visible anywhere else? Is it collapsable?
  3. Does either the Android app or the mobile web have the behavior you are looking for?

I think these methods will strip:

wmf_stringByRecursivelyRemovingParenthesizedContent
wmf_stringByRemovingBracketedContent

Fjalapeno added a subscriber: bearND.

@bearND did any changes occur in MCS to cause this regression?

Is this about page previews or article content? The screenshot from mhurd looks like an article view to me but the title of this task seems to imply page previews. Is iOS getting preview data from mediawiki or from Restbase summary endpoint?

@bearND this is about page previews - the extract value returned from the summary endpoint as well as rest_v1/feed/featured.

Examples:

https://en.wikipedia.org/api/rest_v1/feed/featured/2017/04/19

and

https://en.wikipedia.org/api/rest_v1/page/summary/Donnchadh,_Earl_of_Carrick

both have

"extract":"Donnchadh (Latin: Duncanus; English: Duncan) (pronounced /ˈt̪ˠon̪ˠəxə/)..."

Removing MCS since MCS is not involved in this at all. The summary endpoint is implemented in RESTBase directly. If there is desire to change this server side then I would tag it with TextExtracts, while keeping in mind that this would also affect the mobile web team since they are heavy users of both summary endpoint and TextExtracts.

Fjalapeno added subscribers: Jdlrobson, Dbrant.

@Dbrant @Jdlrobson would you prefer to remove the pronunciation from the summaries (text extracts) in RESTBase?

I do know that iOS would like to strip them server side. And it looks like Android is stripping them locally already.

We've never stripped inside TextExtracts as we have to support all our languages. We need to explore all these related issues and whether it is worth investing time in TextExtract enhancements.

For the time being id recommend the ios app uses REST endpoint

Just to be clear, I don't think this was an upstream regression, or should 'necessarily' happen at the endpoint. We thought if all clients are stripping, it would be worth consolidating.

Feel free to make this an ioS specific fix by untagging TextExtracts and Reading-Web-Backlog. I should note you'll probably want to read T91344 first where we track many of the problems with stripping parenthesis..

For the time being id recommend the ios app uses REST endpoint

BTW, the RESTBase page/summary endpoint implementation uses the output from TextExtracts directly.

Pchelolo subscribed.

Would it make sense to strip it in RESTBase for the summary endpoint since it seems like none of the clients actually need it.

Also I think there was an idea to create an enhanced summary service using the sentence separation code that ContentTranslation has. If that's still a plan maybe we could address that together with that project?

Hey all I've created a separate task to discuss and work on upstream/API solutions. This task was originally for the iOS app's bug, and I'd like to return it to that purpose.

Testing criteria:

  • Load the Explore feed
  • Scroll around and ensure you don't see any pronunciation guides (as specified in the ticket description - you can see one such pronunciation guide in the ticket screenshot)

Tested this with Lutefisk in the 'Because you read' card. No pronunciation guides were seen in the summary.

ABorbaWMF subscribed.

Tested on an iPhone 7+ with iOS 10.3 and an iPhone 5s with iOS 10.2 on Beta App 5.5.0 (1138)

Also tested Lutefisk and a few other explore feed items. No pronunciations are appearing.

JMinor removed a project: RESTBase-API.

Resolved on the client. See T164100 for moving this upstream to the API.