Page MenuHomePhabricator

[Regression] exclude pronunciation guides from article extracts
Closed, ResolvedPublic

Description

Previously we stripped out the phonetic pronunciation guide when presenting extracts in the app (recommendations, featured article, etc). This information, while helpful, often takes up a lot of horizontal space and generally doensn't fit the purspose of extracts (which is to help the user determine if an article is interest/relevant by providing a short informative introduction.

I am open to leaving as is, but we should make a deliberate choice, rather than change this due to regression.

Related Objects

Event Timeline

JMinor created this task.Jan 17 2017, 10:41 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 17 2017, 10:41 PM
JMinor triaged this task as Low priority.Jan 17 2017, 10:41 PM
JMinor moved this task from Needs Triage to Bug Backlog on the Wikipedia-iOS-App-Backlog board.
Mhurd added a subscriber: Mhurd.Apr 13 2017, 12:40 AM

'enwiki > Lutefisk' is a good example.

I've underlined its parenthetical and pronunciation guides which bleed through in its extract:

Mhurd added a comment.Apr 13 2017, 1:06 AM

...would be best if this could be fixed upstream.

@Fjalapeno do you recall if we had a separate ticket for this for fixing it upstream?

@Mhurd @JMinor We are addressing some pronunciation issues in the new MCS API, but the iOS content is from the MediaWiki API.

Couple questions:

  1. Is this a regression?
  2. What is the desired behavior? It is stripped at the top? Is it visible anywhere else? Is it collapsable?
  3. Does either the Android app or the mobile web have the behavior you are looking for?
Mhurd added a comment.Apr 20 2017, 7:03 PM

I think these methods will strip:

wmf_stringByRecursivelyRemovingParenthesizedContent
wmf_stringByRemovingBracketedContent

Fjalapeno added a subscriber: bearND.

@bearND did any changes occur in MCS to cause this regression?

Is this about page previews or article content? The screenshot from mhurd looks like an article view to me but the title of this task seems to imply page previews. Is iOS getting preview data from mediawiki or from Restbase summary endpoint?

JoeWalsh updated the task description. (Show Details)Apr 20 2017, 8:16 PM

@bearND this is about page previews - the extract value returned from the summary endpoint as well as rest_v1/feed/featured.

Examples:

https://en.wikipedia.org/api/rest_v1/feed/featured/2017/04/19

and

https://en.wikipedia.org/api/rest_v1/page/summary/Donnchadh,_Earl_of_Carrick

both have

"extract":"Donnchadh (Latin: Duncanus; English: Duncan) (pronounced /ˈt̪ˠon̪ˠəxə/)..."

If this didn't include the pronunciation before then I must be a regression in TextExtracts. Here's the equivalent MW API query RESTBase makes to get the data for the summary endpoint: https://en.wikipedia.org/w/api.php?action=query&format=json&formatversion=2&redirects=true&prop=info%7Cextracts%7Cpageimages%7Crevisions%7Cpageterms%7Ccoordinates&exsentences=5&explaintext=true&piprop=thumbnail%7Coriginal&pithumbsize=320&pilicense=any&rvprop=timestamp%7Cids&wbptterms=description&titles=Donnchadh,_Earl_of_Carrick.

It doesn't show up in the Android app, though. That's because it strips contents inside parenthesis.

Removing MCS since MCS is not involved in this at all. The summary endpoint is implemented in RESTBase directly. If there is desire to change this server side then I would tag it with TextExtracts, while keeping in mind that this would also affect the mobile web team since they are heavy users of both summary endpoint and TextExtracts.

Fjalapeno added subscribers: Jdlrobson, Dbrant.

@Dbrant @Jdlrobson would you prefer to remove the pronunciation from the summaries (text extracts) in RESTBase?

I do know that iOS would like to strip them server side. And it looks like Android is stripping them locally already.

Jdlrobson claimed this task.

We've never stripped inside TextExtracts as we have to support all our languages. We need to explore all these related issues and whether it is worth investing time in TextExtract enhancements.

For the time being id recommend the ios app uses REST endpoint

Just to be clear, I don't think this was an upstream regression, or should 'necessarily' happen at the endpoint. We thought if all clients are stripping, it would be worth consolidating.

Feel free to make this an ioS specific fix by untagging TextExtracts and Reading-Web-Backlog. I should note you'll probably want to read T91344 first where we track many of the problems with stripping parenthesis..

For the time being id recommend the ios app uses REST endpoint

BTW, the RESTBase page/summary endpoint implementation uses the output from TextExtracts directly.

JMinor claimed this task.Apr 28 2017, 5:55 PM
Pchelolo added a subscriber: Pchelolo.

Would it make sense to strip it in RESTBase for the summary endpoint since it seems like none of the clients actually need it.

Also I think there was an idea to create an enhanced summary service using the sentence separation code that ContentTranslation has. If that's still a plan maybe we could address that together with that project?

Hey all I've created a separate task to discuss and work on upstream/API solutions. This task was originally for the iOS app's bug, and I'd like to return it to that purpose.

Mhurd added a comment.EditedMay 18 2017, 11:41 PM

Testing criteria:

  • Load the Explore feed
  • Scroll around and ensure you don't see any pronunciation guides (as specified in the ticket description - you can see one such pronunciation guide in the ticket screenshot)

Tested this with Lutefisk in the 'Because you read' card. No pronunciation guides were seen in the summary.

ABorbaWMF added a subscriber: ABorbaWMF.

Tested on an iPhone 7+ with iOS 10.3 and an iPhone 5s with iOS 10.2 on Beta App 5.5.0 (1138)

Also tested Lutefisk and a few other explore feed items. No pronunciations are appearing.

JMinor closed this task as Resolved.May 23 2017, 9:20 PM
JMinor removed a project: RESTBase-API.

Resolved on the client. See T164100 for moving this upstream to the API.