Page MenuHomePhabricator

BUG: Snippets of styling code for certain East-Asian text styles are shown in the app
Closed, ResolvedPublic

Description

Certain article text styles in Japanese are showing up with css code snippets visible in the app (but not on the web version of the same article).

Steps to reproduce
  1. Go to this article in the app https://ja.m.wikipedia.org/wiki/江戸川コナン (Note: the English version of the article is "Jimmy Kudo" - if this makes it easier to search in app and then select the Japanese language of the article)
  2. Navigate to the "能力と推理スタイル" (4th) section of the article

Expected

The text in brackets in the first sentence appears normally (per the web version of the article):

Artboard.png (1×1 px, 788 KB)

Actual

Part of what appears to be CSS or other styling code is shown:

Artboard Copy.png (1×1 px, 607 KB)

FWIW, this bug does not occur the iOS app.

Occurring on

First reported via OTRS: https://ticket.wikimedia.org/otrs/index.pl?Action=AgentTicketZoom;TicketID=10049861
Reproduced on Pixel (7.1.2) in Wikipedia version 2.5.198-alpha-2017-06-16

Event Timeline

RHo renamed this task from BUG: CSS code for certain East-Asian text styles are showing as code in the app to BUG: Snippets of styling code for certain East-Asian text styles are shown in the app .Jun 20 2017, 2:37 PM
bearND added a subscriber: bearND.

Seems like an artifact coming from Parsoid: https://ja.wikipedia.org/api/rest_v1/page/html/%E6%B1%9F%E6%88%B8%E5%B7%9D%E3%82%B3%E3%83%8A%E3%83%B3 shows the same:

Screen Shot 2017-06-27 at 2.40.07 PM.png (173×860 px, 57 KB)

頭脳明晰(<rt style=" style="display:inline; white-space:normal; font-size:inherit; font-variant-east-asian:normal">ずのうめいせき)

頭脳明晰(<rt style=" style="display:inline; white-space:normal; font-size:inherit; font-variant-east-asian:normal">ずのうめいせき)

Looks like a syntax error in the html tag that if fixed, should all be good?

In this template https://ja.wikipedia.org/wiki/Template:%E8%AA%AD%E3%81%BF%E4%BB%AE%E5%90%8D somewhere? Hmm, maybe somewhere else. I have a hard time finding any of the substrings there. The PHP-parsed version doesn't seem to have an issue with that, though.

The PHP-parsed version doesn't seem to have an issue with that, though.

Right, we aren't syntax-error-compatible in all scenarios, and we don't aspire to be. We only attempt make sure we handle syntax error recovery identically for the most common errors.

Here's a diff of the HTML in DevTools comparing the PHP-parsed version with the Parsoid parsed versions:

Screen Shot 2017-06-27 at 3.26.45 PM.png (454×1 px, 271 KB)

There is an extra " in the Parsoid version (on the right hand side).

There is an extra " in the Parsoid version (on the right hand side).

See this template for the syntax error. If that is fixed, all will be good. :-)

Arlolra added a subscriber: Arlolra.

Right, we aren't syntax-error-compatible in all scenarios, and we don't aspire to be. We only attempt make sure we handle syntax error recovery identically for the most common errors.

I think Parsoid should still interpret it as a tag though, even if the attributes parse differently (which ideally they wouldn't).

<rt style=" style="display:inline; white-space:normal; font-size:inherit; font-variant-east-asian:normal">{{{2}}}</rt>

Right, we aren't syntax-error-compatible in all scenarios, and we don't aspire to be. We only attempt make sure we handle syntax error recovery identically for the most common errors.

I think Parsoid should still interpret it as a tag though, even if the attributes parse differently (which ideally they wouldn't).

<rt style=" style="display:inline; white-space:normal; font-size:inherit; font-variant-east-asian:normal">{{{2}}}</rt>

Ideally yes. There are always going to be edge cases - I am just cautioning against assuming we'll be able to match error-recovery behavior perfectly. The long tail not always be worth the effort when it is quicker and simpler to fix the markup error.

Change 361916 had a related patch set uploaded (by Arlolra; owner: Arlolra):
[mediawiki/services/parsoid@master] More permissive attribute name parsing

https://gerrit.wikimedia.org/r/361916

Arlolra triaged this task as Medium priority.Jun 28 2017, 8:17 PM
Arlolra raised the priority of this task from Medium to Medium.

Change 361916 had a related patch set uploaded (by Arlolra; owner: Arlolra):
[mediawiki/services/parsoid@master] More permissive attribute name parsing

https://gerrit.wikimedia.org/r/361916

Just so there isn't any confusion, note that with this change, Parsoid now parses the tag correctly. But, that is still going to be different from how PHP parser parses it. Parsoid will drop the style attributes. So, to get identical rendering behavior, that template still needs fixing.

Change 361916 merged by jenkins-bot:
[mediawiki/services/parsoid@master] More permissive attribute name parsing

https://gerrit.wikimedia.org/r/361916