Consider excluding pronunciation guides from TextExtracts
Closed, DeclinedPublic
Actions

Assigned To

Authored By

	• JMinor
	Apr 28 2017, 6:01 PM

Description

The app clients currently use custom handling to remove pronunciation guides, theoretically this might be good to do upstream once. See parent task for background.

An example of an article that includes such a pronunciation guide (outside of parentheses) is https://en.wikipedia.org/wiki/Abugida.

Related Objects
Search...

Status	Assigned	Task
Resolved	None	T169242 Develop Page Content Service for Reading Clients
Resolved	None	T177425 Develop General Layer of PCS
Resolved	• Jhernandez	T177426 Develop structured JSON APIs for general consumption
Resolved	• Mholloway	T177431 Develop a Summary JSON API
Resolved	Dereckson	T68374 Enable Hovercards on se.wikimedia.org (Swedish chapter wiki)
Resolved	Jdlrobson	T70860 [GOAL] Graduate Page Previews feature (Popups extension) out of Beta Feature
Resolved	ovasileva	T154635 [EPIC] Deploy page previews to English and German Wikipedia
Resolved	ovasileva	T192622 [EPIC] Page previews post-deploy cleanup
Resolved	Jdlrobson	T173952 Remove A/B testing instrumentation code
Duplicate	None	T167433 Switch all projects to the new (and yet to be built) summary-html endpoint for page previews
Duplicate	None	T167429 Make enwiki and dewiki fetch previews from the summary-html RESTBase endpoint
Resolved	ovasileva	T165018 Page previews can consume new summary-HTML endpoint
Declined	Jdlrobson	T111329 [GOAL] Page previews on mobileweb
Resolved	Jdlrobson	T164010 [EPIC] Strengthen the APIs we provide in reading web maintained extensions
Resolved	ovasileva	T113094 [EPIC] The Page Summary API needs to provide useful content for the majority of articles
Resolved	Mhurd	T155573 [Regression] exclude pronunciation guides from article extracts
Declined	• JMinor	T164100 Consider excluding pronunciation guides from TextExtracts

Event Timeline

• JMinor created this task.Apr 28 2017, 6:01 PM

Jdlrobson added a parent task: T113094: [EPIC] The Page Summary API needs to provide useful content for the majority of articles.Apr 28 2017, 6:24 PM

• Pchelolo added a project: Services (designing).Apr 28 2017, 6:25 PM

So, we could port this Android code to RESTBase and strip it while generating the summary response. The question is whether there's any client that need this information? If not - the solution is quite straightforward.

Long term reading web team is planning to rethink how we do TextExtracts as we have lots of problems with the existing usages. My guess is that we will want to put more transformations inside the RESTBase endpoint and switch to HTML as the default for all extracts (possibly avoiding using TextExtracts extension altogether). The epic is here - T113094 - and we've started to flesh out requirements.

@Pchelolo @Jdlrobson

So short term what do we want to do here?

If we strip pronunciations fro summaries, this affects page previews. @Jdlrobson will that negatively affect Mobile Web?

If not, lets just strip it

If it will affect page previews then lets do one of the following:

Have the clients to do this client side

Strip it, but version the change

Explicitly mark it up to make it easier for clients to strip

• Fjalapeno mentioned this in T113094: [EPIC] The Page Summary API needs to provide useful content for the majority of articles.May 1 2017, 2:36 PM

@Jdlrobson will that negatively affect Mobile Web?

This will only impact page previews. Given the issues we've seen with this in MCS I'd appreciate if this was stripped on the client for the time being.

The problem with doing it in MCS is we'd be operating on plain text. The way I'd like to see us go about addressing T113094 is to create a new endpoint that doesn't rely on TextExtracts and makes the extract HTML (given we want to support math). Would the apps want to continue using plaintext as if not we'll need to start thinking about how we ship both or provide both options in most our endpoints!

I think this task is a dupe of T91344 which looks at the bigger problem we have with parentheses.

The pronunciation templates already output a class called nopopups specifically for the purpose of striping the content from navigation popups. I'm surprised Hovercards isn't already honoring this class. TextExtracts offers a similar class noexerpts, so special casing pronunciation guides shouldn't be necessary. In this case, the community is saying "We want pronunciations in TextExtracts, but we don't want them in Popups." We just need to honor the classes that are already available here, not create new code for handling a special case.

I'm going to boldly edit the title and description to clarify the correct way to solve this. If I'm overstepping, feel free to revert.

kaldari renamed this task from Consider excluding pronunciation guides from TextExtracts to Hovercards (and the apps) should honor the nopopups class within page previews.May 1 2017, 7:34 PM

kaldari edited projects, added Page-Previews, Android-app-Bugs, iOS-app-Bugs; removed Services (designing), RESTBase-API.

kaldari updated the task description. (Show Details)

Restricted Application added projects: Wikipedia-Android-App-Backlog, Wikipedia-iOS-App-Backlog. · View Herald TranscriptMay 1 2017, 7:34 PM

kaldari renamed this task from Hovercards (and the apps) should honor the nopopups class within page previews to Hovercards and the apps should honor the nopopups class within page previews.May 1 2017, 7:34 PM

Jdlrobson added a project: Web-Team-Backlog.May 1 2017, 10:33 PM

Dbrant moved this task from Needs Triage to Tracking on the Wikipedia-Android-App-Backlog board.May 2 2017, 2:57 PM

So the page previews uses the RESTBase summary endpoint which emits the plain-text extract. Potentially we can change that to the HTML extract, but need to verify with all the known clients of the endpoint. Under the hood it uses the TextExtracts extension, but I don't see it emitting the classes in the HTML. @kaldari how do I fetch the html with the nopopup class from the TextExtracts API?

What is the history of this nopopups class? Where is it documented?
Typically use of classes in this way has been very brittle and poorly encapsulated in code.

@Jdlrobson: Use of nopopups dates to the earliest implementation of navigation popups back in 2007. It's basically used for exactly the task described here: preventing pronunciation guides from showing up in popup previews. As far as I know there is no documentation for it, however.

@Pchelolo: I see what you mean. The best approach might be for us to add nopopups to $wgExtractsRemoveClasses, but I'm wondering if this will cause weird artifacts, like "Blah (;;)". If so, maybe my implementation idea isn't feasible after all.

• JMinor moved this task from Needs Triage to Tracking on the Wikipedia-iOS-App-Backlog board.May 2 2017, 11:42 PM

Kaartic subscribed.May 4 2017, 11:57 AM

As a passerby, I would like add something odd that I noticed about the content to which the nopopups class is being added. In the Albert Einstein I noticed that the following in the wikitext,

({{IPAc-en|ˈ|aɪ|n|s|t|aɪ|n}};<ref>{{cite book|last=Wells|first=John|authorlink=John C. Wells|title=Longman Pronunciation Dictionary|publisher=Pearson Longman|edition=3rd|date=April 3, 2008|isbn=1-4058-8118-6}}</ref> {{IPA-de|ˈalbɛɐ̯t ˈaɪnʃtaɪn|lang|Albert Einstein german.ogg}};

But in the generated HTML looks something like,

Screenshot from 2017-05-04 18-16-49.png (408×1 px, 49 KB)

I noticed that the nopopups class is being added for the phonetic text alone. Shouldn't it be added to every content that gets generated as a result of it (the audio file and the word "German" ? Is this done for any reason ?

In T164100#3235126, @Kaartic wrote:

I noticed that the nopopups class is being added for the phonetic text alone. Shouldn't it be added to every content that gets generated as a result of it (the audio file and the word "German" ? Is this done for any reason ?

Ok, I was partially wrong in my comment. It seems that the nopopups class is being added to to the content generated as a result of the IPAc-en template and not for the content generated as a result of the IPA-de content. Any reasons for this ?

@Kaartic: I think navigation popups also strips everything in tags, so that takes care of the audio file link and the "German:" label. Here's a screenshot of how it renders in navigation popups:

Screen Shot 2017-05-04 at 11.51.16 AM.png (211×463 px, 79 KB)

It does output "( ; ; " as I feared, so adding nopopups to $wgExtractsRemoveClasses probably wouldn't be a good solution. The ideal solution, IMO, would be to have HoverCards and the apps strip out everything inside parenthetical phrases and also anything with class nopopups or noexerpts or inside tags, but it seems there is no clear path to doing that. I think I'm going to change this task back to its original title and description so that it can move forward.

kaldari renamed this task from Hovercards and the apps should honor the nopopups class within page previews to Consider excluding pronunciation guides from TextExtracts.May 4 2017, 7:00 PM

kaldari updated the task description. (Show Details)

So one possible solution on the TextExtracts side would be to add a new mode (controlled by an API parameter) that strips all content delimited by parentheses, brackets, or slashes.

In T164100#3236635, @kaldari wrote:

@Kaartic: I think navigation popups also strips everything in tags, so that takes care of the audio file link and the "German:" label.

Ignoring TextExtract and speaking generally, shouldn't the contents generated by the IPA / IPAc template be tagged with the IPA class and shouldn't the IPA text itself be tagged with something more specific (IPA-text) ?

This may help the implementations, that don't use TextExtract, to easily identify the IPA and related text. I guess the TextExtract implementation itself would be simplified as a result of this.

@kaldari

So one possible solution on the TextExtracts side would be to add a new mode (controlled by an API parameter) that strips all content delimited by parentheses, brackets, or slashes.

^ that would be great! (we're having to strip "(...)", "[...]" and "\...\" manually in the iOS app)

In T164100#3236635, @kaldari wrote:

@Kaartic: I think navigation popups also strips everything in tags, so that takes care of the audio file link and the "German:" label. Here's a screenshot of how it renders in navigation popups:

It does output "( ; ; " as I feared, so adding nopopups to $wgExtractsRemoveClasses probably wouldn't be a good solution. The ideal solution, IMO, would be to have HoverCards and the apps strip out everything inside parenthetical phrases and also anything with class nopopups or noexerpts or inside tags, but it seems there is no clear path to doing that. I think I'm going to change this task back to its original title and description so that it can move forward.

By default, Navpopups will not show any templates at all. It also doesn't parse templates. I believe this is primarily to fix:

Hatnotes
Maintenance banners
(Infoboxes/taxoboes are already excluded via Previewmaker.prototype.killBoxTemplates = function ())

I believe editors who use navpopups would generally be happy if they had an option to show parsed templates in the lead sentences, whilst still excluding the 2 items above. This probably just means a coordinated effort to standardize the CSS classes everywhere needed?

If you change the setting to show templates, it looks something like this (I have other tweaks in mine):

• JMinor mentioned this in T155573: [Regression] exclude pronunciation guides from article extracts.May 23 2017, 9:20 PM

Jdlrobson moved this task from Incoming to 2014-15 Q4 on the Web-Team-Backlog board.Jun 5 2017, 6:08 PM

Jdlrobson moved this task from 2014-15 Q4 to Tracking on the Web-Team-Backlog board.Jun 29 2017, 9:08 PM

Jdlrobson edited projects, added Web-Team-Backlog (Tracking); removed Web-Team-Backlog.

Jdlrobson removed a project: Web-Team-Backlog (Tracking).

Jdlrobson removed a project: TextExtracts.Jun 29 2017, 9:49 PM

Jdlrobson added projects: TextExtracts, Web-Team-Backlog.

Jdlrobson moved this task from Incoming to Tracking on the Web-Team-Backlog board.Jul 7 2017, 8:13 PM

Jdlrobson edited projects, added Web-Team-Backlog (Tracking); removed Web-Team-Backlog.

Jdlrobson moved this task from Untriaged to Move to Backlog on the Web-Team-Backlog (Tracking) board.

@phuedx and I have agreed that TextExtracts is not the right place to do it. It should be kept as dumb as possible to cater for all possible use cases and doing this there is not as easy as in RESTBase. For the purpose of summaries we will create a new endpoint.

Thus, we will not be doing this in the TextExtracts API, but you are more than welcome to debate it as being a candidate for inclusion of the new REST API endpoint we plan to build (T113094)

	F7913878: Screen Shot 2017-05-04 at 11.51.16 AM.png
	May 4 2017, 6:58 PM

	F7909512: Screenshot from 2017-05-04 18-16-49.png
	May 4 2017, 12:52 PM

	F8042749: Selection_032.png
	May 11 2017, 8:30 PM

Consider excluding pronunciation guides from TextExtractsClosed, DeclinedPublicActions

Description

Related ObjectsSearch...

Event Timeline

Consider excluding pronunciation guides from TextExtracts
Closed, DeclinedPublic
Actions

Related Objects
Search...