[Regression] exclude pronunciation guides from article extracts
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	• JMinor
	Jan 17 2017, 10:41 PM

Description

Simulator Screen Shot Apr 20, 2017, 4.16.08 PM.png (1×750 px, 524 KB)

Previously we stripped out the phonetic pronunciation guide when presenting extracts in the app (recommendations, featured article, etc). This information, while helpful, often takes up a lot of horizontal space and generally doensn't fit the purspose of extracts (which is to help the user determine if an article is interest/relevant by providing a short informative introduction.

I am open to leaving as is, but we should make a deliberate choice, rather than change this due to regression.

Related Objects
Search...

Status	Assigned	Task
Resolved	None	T169242 Develop Page Content Service for Reading Clients
Resolved	None	T177425 Develop General Layer of PCS
Resolved	• Jhernandez	T177426 Develop structured JSON APIs for general consumption
Resolved	• Mholloway	T177431 Develop a Summary JSON API
Resolved	Dereckson	T68374 Enable Hovercards on se.wikimedia.org (Swedish chapter wiki)
Resolved	Jdlrobson	T70860 [GOAL] Graduate Page Previews feature (Popups extension) out of Beta Feature
Resolved	ovasileva	T154635 [EPIC] Deploy page previews to English and German Wikipedia
Resolved	ovasileva	T192622 [EPIC] Page previews post-deploy cleanup
Resolved	Jdlrobson	T173952 Remove A/B testing instrumentation code
Duplicate	None	T167433 Switch all projects to the new (and yet to be built) summary-html endpoint for page previews
Duplicate	None	T167429 Make enwiki and dewiki fetch previews from the summary-html RESTBase endpoint
Resolved	ovasileva	T165018 Page previews can consume new summary-HTML endpoint
Declined	Jdlrobson	T111329 [GOAL] Page previews on mobileweb
Resolved	Jdlrobson	T164010 [EPIC] Strengthen the APIs we provide in reading web maintained extensions
Resolved	ovasileva	T113094 [EPIC] The Page Summary API needs to provide useful content for the majority of articles
Resolved	Mhurd	T155573 [Regression] exclude pronunciation guides from article extracts
Declined	• JMinor	T164100 Consider excluding pronunciation guides from TextExtracts

Event Timeline

• JMinor created this task.Jan 17 2017, 10:41 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 17 2017, 10:41 PM

• JMinor triaged this task as Low priority.Jan 17 2017, 10:41 PM

• JMinor moved this task from Needs Triage to Bug Backlog on the Wikipedia-iOS-App-Backlog board.

• NHarateh_WMF added a project: iOS-app-v5.5.0-Snake-On-A-Magic-Towel.Apr 3 2017, 7:42 PM

• JMinor moved this task from Tasks from Product Backlog to Ready for Development on the iOS-app-v5.5.0-Snake-On-A-Magic-Towel board.Apr 3 2017, 10:03 PM

'enwiki > Lutefisk' is a good example.

I've underlined its parenthetical and pronunciation guides which bleed through in its extract:

Screen Shot 2017-04-12 at 5.36.49 PM.png (1×862 px, 582 KB)

...would be best if this could be fixed upstream.

@Fjalapeno do you recall if we had a separate ticket for this for fixing it upstream?

@Mhurd @JMinor We are addressing some pronunciation issues in the new MCS API, but the iOS content is from the MediaWiki API.

Couple questions:

Is this a regression?
What is the desired behavior? It is stripped at the top? Is it visible anywhere else? Is it collapsable?
Does either the Android app or the mobile web have the behavior you are looking for?

I think these methods will strip:

wmf_stringByRecursivelyRemovingParenthesizedContent
wmf_stringByRemovingBracketedContent

@bearND did any changes occur in MCS to cause this regression?

Is this about page previews or article content? The screenshot from mhurd looks like an article view to me but the title of this task seems to imply page previews. Is iOS getting preview data from mediawiki or from Restbase summary endpoint?

JoeWalsh updated the task description. (Show Details)Apr 20 2017, 8:16 PM

@bearND this is about page previews - the extract value returned from the summary endpoint as well as rest_v1/feed/featured.

Examples:

https://en.wikipedia.org/api/rest_v1/feed/featured/2017/04/19

and

https://en.wikipedia.org/api/rest_v1/page/summary/Donnchadh,_Earl_of_Carrick

both have

"extract":"Donnchadh (Latin: Duncanus; English: Duncan) (pronounced /ˈt̪ˠon̪ˠəxə/)..."

If this didn't include the pronunciation before then I must be a regression in TextExtracts. Here's the equivalent MW API query RESTBase makes to get the data for the summary endpoint: https://en.wikipedia.org/w/api.php?action=query&format=json&formatversion=2&redirects=true&prop=info%7Cextracts%7Cpageimages%7Crevisions%7Cpageterms%7Ccoordinates&exsentences=5&explaintext=true&piprop=thumbnail%7Coriginal&pithumbsize=320&pilicense=any&rvprop=timestamp%7Cids&wbptterms=description&titles=Donnchadh,_Earl_of_Carrick.

It doesn't show up in the Android app, though. That's because it strips contents inside parenthesis.

device-2017-04-20-144012.png (1×1 px, 226 KB)

device-2017-04-20-144041-lutefisk.png (1×1 px, 783 KB)

Removing MCS since MCS is not involved in this at all. The summary endpoint is implemented in RESTBase directly. If there is desire to change this server side then I would tag it with TextExtracts, while keeping in mind that this would also affect the mobile web team since they are heavy users of both summary endpoint and TextExtracts.

@Dbrant @Jdlrobson would you prefer to remove the pronunciation from the summaries (text extracts) in RESTBase?

I do know that iOS would like to strip them server side. And it looks like Android is stripping them locally already.

• Fjalapeno mentioned this in T113094: [EPIC] The Page Summary API needs to provide useful content for the majority of articles.Apr 21 2017, 9:34 PM

• Jhernandez added a project: Web-Team-Backlog.Apr 24 2017, 11:29 AM

Jdlrobson claimed this task.Apr 24 2017, 3:42 PM

Jdlrobson moved this task from Incoming to Needs Prioritization on the Web-Team-Backlog board.

We've never stripped inside TextExtracts as we have to support all our languages. We need to explore all these related issues and whether it is worth investing time in TextExtract enhancements.

For the time being id recommend the ios app uses REST endpoint

Jdlrobson added a parent task: T113094: [EPIC] The Page Summary API needs to provide useful content for the majority of articles.Apr 24 2017, 8:11 PM

Just to be clear, I don't think this was an upstream regression, or should 'necessarily' happen at the endpoint. We thought if all clients are stripping, it would be worth consolidating.

Feel free to make this an ioS specific fix by untagging TextExtracts and Reading-Web-Backlog. I should note you'll probably want to read T91344 first where we track many of the problems with stripping parenthesis..

• JMinor removed projects: Web-Team-Backlog, TextExtracts.Apr 24 2017, 8:23 PM

In T155573#3207745, @Jdlrobson wrote:

For the time being id recommend the ios app uses REST endpoint

BTW, the RESTBase page/summary endpoint implementation uses the output from TextExtracts directly.

Mhurd moved this task from Ready for Development to Doing on the iOS-app-v5.5.0-Snake-On-A-Magic-Towel board.Apr 27 2017, 1:11 AM

Mhurd moved this task from Doing to Blocked or Waiting on the iOS-app-v5.5.0-Snake-On-A-Magic-Towel board.

Mhurd moved this task from Blocked or Waiting to Doing on the iOS-app-v5.5.0-Snake-On-A-Magic-Towel board.

Mhurd moved this task from Doing to Blocked or Waiting on the iOS-app-v5.5.0-Snake-On-A-Magic-Towel board.Apr 27 2017, 1:14 AM

• JMinor claimed this task.Apr 28 2017, 5:55 PM

Would it make sense to strip it in RESTBase for the summary endpoint since it seems like none of the clients actually need it.

Also I think there was an idea to create an enhanced summary service using the sentence separation code that ContentTranslation has. If that's still a plan maybe we could address that together with that project?

• JMinor created subtask T164100: Consider excluding pronunciation guides from TextExtracts.Apr 28 2017, 6:01 PM

Hey all I've created a separate task to discuss and work on upstream/API solutions. This task was originally for the iOS app's bug, and I'd like to return it to that purpose.

• NHarateh_WMF moved this task from Blocked or Waiting to Ready for Development on the iOS-app-v5.5.0-Snake-On-A-Magic-Towel board.May 9 2017, 6:38 PM

Mhurd claimed this task.May 9 2017, 8:06 PM

Mhurd moved this task from Ready for Development to Doing on the iOS-app-v5.5.0-Snake-On-A-Magic-Towel board.

Diffusion mentioned this in rAPIOSc7d801c3cbaa: Merge branch 'develop' into regression/remove-pronunciation/T155573.May 10 2017, 7:02 PM

Diffusion mentioned this in rAPIOS4c86d13100e8: Merge branch 'develop' into regression/remove-pronunciation/T155573.May 10 2017, 9:30 PM

Mhurd moved this task from Doing to Needs Code Review on the iOS-app-v5.5.0-Snake-On-A-Magic-Towel board.May 10 2017, 9:39 PM

JoeWalsh mentioned this in rAPIOS19806f653006: Merge pull request #1419 from wikimedia/regression/remove-pronunciation/T155573.May 10 2017, 9:40 PM

Mhurd moved this task from Needs Code Review to Needs Testing Criteria on the iOS-app-v5.5.0-Snake-On-A-Magic-Towel board.May 10 2017, 9:48 PM

https://github.com/wikimedia/wikipedia-ios/pull/1419

Testing criteria:

Load the Explore feed
Scroll around and ensure you don't see any pronunciation guides (as specified in the ticket description - you can see one such pronunciation guide in the ticket screenshot)

Mhurd moved this task from Needs Testing Criteria to Needs Design Review on the iOS-app-v5.5.0-Snake-On-A-Magic-Towel board.May 18 2017, 11:42 PM

Tested this with Lutefisk in the 'Because you read' card. No pronunciation guides were seen in the summary.

Tested on an iPhone 7+ with iOS 10.3 and an iPhone 5s with iOS 10.2 on Beta App 5.5.0 (1138)

Also tested Lutefisk and a few other explore feed items. No pronunciations are appearing.

Resolved on the client. See T164100 for moving this upstream to the API.

Jdlrobson closed subtask T164100: Consider excluding pronunciation guides from TextExtracts as Declined.Jul 13 2017, 6:55 PM

	F7673123: device-2017-04-20-144041-lutefisk.png
	Apr 20 2017, 8:48 PM

	F7672665: Simulator Screen Shot Apr 20, 2017, 4.16.08 PM.png
	Apr 20 2017, 8:16 PM

	F7515252: Screen Shot 2017-04-12 at 5.36.49 PM.png
	Apr 13 2017, 12:40 AM

[Regression] exclude pronunciation guides from article extractsClosed, ResolvedPublicActions

Description

Related ObjectsSearch...

Event Timeline

[Regression] exclude pronunciation guides from article extracts
Closed, ResolvedPublic
Actions

Related Objects
Search...