Page MenuHomePhabricator

Broken wikitext with bold inside and outside a label wrongly interpretted as correct by Parsoid
Closed, ResolvedPublic

Description

Author: jmichrina

Description:
Example:
Line from "International_Air_Transport_Association_airport_code"

DFW for Dallas–Fort Worth, DTW for Detroit–Wayne County, RDU for Raleigh–Durham, MSP for Minneapolis–St. Paul and LBA for Leeds Bradford (Airport).

Line appears correctly formatted in editor [caps represent bold]:
dfw for Dallas-Fort Worth, dtw for DetroiT-Wayne county, rdu for Raleigh-DUrham, msp for Minneapolis-St. Paul and lba for Leeds Bradford (Airport).

Line appears incorrect on page [caps represent bold]:
dfw for DALLAS-FORT WORTH, stw for dEtROIT-wAYNE COUNTY, rdu for RALEIGH-DURHAM, msp for mINNEAPOLIS-sT. pAUL and lba for LEEDS BRADFORD (AIRPORT).

Note that the correct version appears in the editor even after it's saved.

Possibly related to bug 53208, but I didn't have time to verify.


Version: unspecified
Severity: major

Details

Reference
bz54454

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 2:10 AM
bzimport added a project: Parsoid.
bzimport set Reference to bz54454.

I have fixed the particular problem with this edit: https://en.wikipedia.org/w/index.php?title=International_Air_Transport_Association_airport_code&diff=574406042&oldid=574202912

This is caused by a bug in Parsoid, I believe, which is wrongly interpretting wikitext as valid when it should be invalid. Specifically:

'''[[Foo|F'''oo '''B'''ar]]

… as …

<b><a href=Foo>F</a></b><a href=Foo>oo </a><b><a href=Foo>B</a></b><a href=Foo>ar</a>

… whereas it should result in broken HTML, per the PHP parser (post-Tidy), as:

<b><a href=Foo>F<b>oo </b>B<b>ar</b></a>

The broken wikitext was added in https://en.wikipedia.org/w/index.php?title=International_Air_Transport_Association_airport_code&diff=568156811&oldid=568156261 which isn't tagged as a VisualEditor edit but possibly could have been (and a secondary bug means it isn't tagged) - will investigate separately.

Tidy should result in non-broken HTML (unless there is a bug in Tidy). I just checked and Tidy fixes PHP parser's broken HTML and generates: <p><b><a href="/wiki/Foo" title="Foo" class="mw-redirect">F<b>oo</b> B<b>ar</b></a></b></p> (Can be verifed at https://en.wikipedia.org/wiki/User:Ssastry/sandbox)

As for Parsoid, yes, this kind of broken wikitext was not being handled properly so far, but that is set to change with https://gerrit.wikimedia.org/r/#/c/83216/ which is awaiting review.

That patch generates the following HTML on the snippet which is similar to what Tidy generates

<p data-parsoid='{"dsr":[0,27,0,0]}'><b data-parsoid='{"autoInsertedEnd":1,"dsr":[0,27,3,0]}'><a rel="mw:WikiLink" href="./Foo" data-parsoid='{"stx":"piped","a":{"href":"./Foo"},"sa":{"href":"Foo"},"dsr":[3,27,6,2]}'>F<b data-parsoid='{"dsr":[10,19,3,3]}'>oo </b>B<b data-parsoid='{"autoInsertedEnd":1,"dsr":[20,25,3,0]}'>ar</b></a></b></p>

This has long been merged, so closing as fixed. Please reopen if there is still an issue.