Page MenuHomePhabricator

Reference span without reference list crashes parsoid during HTML to Wikitext conversion
Closed, ResolvedPublic

Description

Problematic HTML Content: http://etherpad.wikimedia.org/p/parsoid-crash

Causes 503 at http://parsoid-lb.eqiad.wikimedia.org while doing HTML to wikitext conversion
It succeeds in http://parsoid.wmflabs.org/ and returns wiki text, which I understand as unmaintained parsoid instance.

Content Translation tries to include the reference list(or references) in the translation whenever a reference is added to translation. But right now it is not a 100% guaranteed one. For example http://en.wikipedia.beta.wmflabs.org/wiki/Special:ContentTranslation?page=Dong+Qichang&from=fr&to=es&targettitle=Dong+Qichang&debug=1 is one such instance where our references inclusion fails . We will investigate this separately,

But does it make sense to avoid 503s in this kind of cases?

Event Timeline

santhosh raised the priority of this task from to High.
santhosh updated the task description. (Show Details)
santhosh subscribed.
santhosh renamed this task from HTML snippet crashes parsoid during HTML to Wikitext conversion to Reference span without reference list crashes parsoid during HTML to Wikitext conversion.Apr 16 2015, 10:38 AM
santhosh updated the task description. (Show Details)
santhosh set Security to None.

Change 204487 had a related patch set uploaded (by Santhosh):
Make sure references templates not getting removed from source

https://gerrit.wikimedia.org/r/204487

I think this had come up earlier in some other context.

But, if you are going to use data-mw.body.id in <ref>s when sending that to Parsoid, then, you have to include the <references/> section output as well since the actual HTML for serialization is present in that section and the <ref> cannot be serialized in isolation.

If you want to serialize a <ref> in isolation, you have to provide data-mw.body.html.

But, looks like our debugging API hasn't been updated to return the proper error messages -- we shouldn't return a 503 for sure, but instead return a useful error message.

Change 204487 merged by jenkins-bot:
Make sure <references> not getting removed from source

https://gerrit.wikimedia.org/r/204487

What's the status of this? Does this still happen?

santhosh claimed this task.