Page MenuHomePhabricator

"Error while publishing - parsoidserver" when diacritic in target and references are added
Closed, ResolvedPublic

Description

Example URL: http://en.wikipedia.beta.wmflabs.org/wiki/Special:ContentTranslation?page=Santos+Col%C3%B3n&from=es&to=pt&debug=true

Note that this mostly happens while we add References, otherwise page is published. This seems related to Parsoid service.


Version: master
Severity: major

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 3:56 AM
bzimport set Reference to bz73119.
bzimport added a subscriber: Unknown Object (MLST).

Could you dump the HTML you are sending to Parsoid?

Niklas: on T75121, I had added a comment indicating that if you added the about and data-mw attributes, that should work. The dom spec has been updated as well to make this clear.

Can someone confirm if this is still a problem once you fix the HTML that you send Parsoid?

Arrbee added a project: LE-Sprint-79.

Subbu, Santhosh is taking a look at this later today. He can confirm in some time. Thanks.

Change 177189 had a related patch set uploaded (by Santhosh):
Keep data-mw attributes for references to avoid parsoid error

https://gerrit.wikimedia.org/r/177189

Patch-For-Review

@ssastry, I added data-mw to the references and it fixed the error. I am able to publish Santos_Colón from es after translation.

The data-mw , data-parsoid attributes were removed before sending the HTML to Machine translation engines to give them minimal HTML to work with. After Machine translation, we are now restoring data-mw for references.

Change 177189 merged by jenkins-bot:
Keep data-mw attributes for references to avoid parsoid error

https://gerrit.wikimedia.org/r/177189

santhosh moved this task from In Review to Done on the LE-Sprint-79 board.