Page MenuHomePhabricator

Round-trip test of itwiki/Campionato_mondiale_Supersport_2004 never finishes
Closed, ResolvedPublic

Description

node roundtrip-test.js --prefix it Campionato_mondiale_Supersport_2004

hasn't finished after 2 hours of CPU time. Its memoryn usage is stable between 650 and 730MB.


Version: unspecified
Severity: normal
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=57069

Details

Reference
bz57071

Event Timeline

bzimport raised the priority of this task from to Needs Triage.
bzimport set Reference to bz57071.
marcoil created this task.Nov 14 2013, 7:30 PM

After 4 hours, it finished with

FATAL ERROR: JS Allocation failed - process out of memory

so it may be related to bug #57069.

I noticed that _rt in the API handles these pages just fine. The difference being that env.conf.parsoid.editMode = true there.

Setting that in roundtrip-test.js seems to solve the problem.

Not really sure what editMode is though.

Oh .. interesting .. and also odd. I rememeber clearly gabriel or I turning on edit mode for round trip testing as well. Did that regress? But, also indicates some bug when editMode is not set.

editMode indicates that we are pretending as if the document will be edited and the HTML should be serialized assuming it was edited (even though, in reality in all our commandline and roundtrip testing, it never is). editMode effectively prevents us from using certain kinds of source information about wikitext fixes that the parser makes (ex: if a closing tag was added automatically, that information is recorded in the autoInserted* flags in data-parsoid. If editMode=false, we can use that information to acccurately serialize the HTML. But if editMode=true, we cannot use that information unconditionally).

Or maybe not .. I think I was confused with turning it on for parse.js and parserTests. We know that setting editMode to true for roundtrip-test.js will generate a ton of noisy diffs which are irrelevant in production where selser suppresses all the noise.

The output of Util.parse() is ~3x (for this test case). The difference seems to be that, in dom.cleanup.js, "mw:Placeholder/StrippedTag" metas aren't being removed when editMode is false.

Is that desirable?

3x! wow. that is a lot. That page must have a lot of stray closing tags then .. or we have a bug in our markBuilderCorrectedTags code.

And yes, with editMode false, we want them around so we can rerender the stray closing tag in html2wt mode.

(In reply to comment #6)

3x! wow. that is a lot. That page must have a lot of stray closing tags then
..
or we have a bug in our markBuilderCorrectedTags code.

It's not the quantity that's the problem, it's that the data-parsoid for some of them are getting stuffed with the entire page source. That's bloating the dom and reeking havoc on the algo. in jsDiff.

Change 95747 had a related patch set uploaded by Arlolra:
tagId was incorrectly used as tsr info in TreeBuilderFixups

https://gerrit.wikimedia.org/r/95747

Change 95747 merged by jenkins-bot:
tagId was incorrectly used as tsr info in TreeBuilderFixups

https://gerrit.wikimedia.org/r/95747

Change 101300 had a related patch set uploaded by GWicke:
tagId was incorrectly used as tsr info in TreeBuilderFixups

https://gerrit.wikimedia.org/r/101300

Change 101300 merged by GWicke:
tagId was incorrectly used as tsr info in TreeBuilderFixups

https://gerrit.wikimedia.org/r/101300