Page MenuHomePhabricator

Reference span without reference list crashes parsoid during HTML to Wikitext conversion
Closed, ResolvedPublic

Description

Problematic HTML Content: http://etherpad.wikimedia.org/p/parsoid-crash

Causes 503 at http://parsoid-lb.eqiad.wikimedia.org while doing HTML to wikitext conversion
It succeeds in http://parsoid.wmflabs.org/ and returns wiki text, which I understand as unmaintained parsoid instance.

Content Translation tries to include the reference list(or references) in the translation whenever a reference is added to translation. But right now it is not a 100% guaranteed one. For example http://en.wikipedia.beta.wmflabs.org/wiki/Special:ContentTranslation?page=Dong+Qichang&from=fr&to=es&targettitle=Dong+Qichang&debug=1 is one such instance where our references inclusion fails . We will investigate this separately,

But does it make sense to avoid 503s in this kind of cases?

Event Timeline

santhosh raised the priority of this task from to High.
santhosh updated the task description. (Show Details)
santhosh added a subscriber: santhosh.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 16 2015, 10:06 AM
santhosh renamed this task from HTML snippet crashes parsoid during HTML to Wikitext conversion to Reference span without reference list crashes parsoid during HTML to Wikitext conversion.Apr 16 2015, 10:38 AM
santhosh updated the task description. (Show Details)
santhosh set Security to None.

Change 204487 had a related patch set uploaded (by Santhosh):
Make sure references templates not getting removed from source

https://gerrit.wikimedia.org/r/204487

I think this had come up earlier in some other context.

But, if you are going to use data-mw.body.id in <ref>s when sending that to Parsoid, then, you have to include the <references/> section output as well since the actual HTML for serialization is present in that section and the <ref> cannot be serialized in isolation.

If you want to serialize a <ref> in isolation, you have to provide data-mw.body.html.

But, looks like our debugging API hasn't been updated to return the proper error messages -- we shouldn't return a 503 for sure, but instead return a useful error message.

ssastry moved this task from Backlog to Non-Parsoid Tasks on the Parsoid board.Apr 21 2015, 3:49 AM

Change 204487 merged by jenkins-bot:
Make sure <references> not getting removed from source

https://gerrit.wikimedia.org/r/204487

What's the status of this? Does this still happen?

Amire80 moved this task from Needs Triage to CX6 on the ContentTranslation board.Jun 23 2015, 8:35 AM
santhosh closed this task as Resolved.Jun 25 2015, 3:04 AM
santhosh claimed this task.