Page MenuHomePhabricator

Reply posted on wrong page on pl.wp, content from transcluded subpages duplicated
Closed, ResolvedPublic

Event Timeline

Parsoid is incorrectly marking up the transclusions: https://pl.wikipedia.org/api/rest_v1/page/html/Wikipedia%3APoczekalnia%2FZgłoszenia?redirect=false&stash=true

There is an about="#mwt2 about-group, which covers most of the page, but there is no initial element with about="#mwt2" typeof="mw:Transclusion" that would describe it.

There is an about="#mwt2 about-group, which covers most of the page, but there is no initial element with about="#mwt2" typeof="mw:Transclusion" that would describe it.

After section wrapping, the transclusion info is hoisted to a section wrapper, and the section wrapping code seems to have improperly re-assigned about ids on some dom children which effectively clobbered the transclusion info for when section wrappers are removed (which VE does right now and presumably DiscussionTools does as well).

Parsoid tries to preserve correctness of transclusion info even if section wrappers are stripped but clearly there is a bug here.

Try php bin/parse.php --domain pl.wikipedia.org --pageName 'Wikipedia:Poczekalnia/Zgłoszenia' --wrapSections < /dev/null and again without the --wrapSections option if you want to check this locally.

I made edits on the subpages to fix the messed up heading structure on that page, but it did not affect this problem ( 1   2   3 ). Might make it easier to debug though.

The affected transclusion is from https://pl.wikipedia.org/wiki/Wikipedia:Poczekalnia/biografie. I looked there, and a single section on that page is also not marked correctly as a transclusion:

(screenshots from VE:)
image.png (2×3 px, 860 KB) image.png (2×3 px, 870 KB) image.png (2×3 px, 885 KB)

Copy of the HTML for reference:
https://pl.wikipedia.org/api/rest_v1/page/html/Wikipedia%3APoczekalnia%2Fbiografie?redirect=false&stash=true

And some HTML snippets from that. Several things look wrong here. I don't know enough about Parsoid to guess the issue, but it might offer some hints for you:

  • The first transclusion about="#mwt4" contains both the page Wikipedia:Poczekalnia/biografie/2020:10:15:Lucyna_Kulińska, and "{{Wikipedia:Poczekalnia/biografie/2020:10:13:Karol Wilhelm Seeliger}}" as wikitext??? Presumably the first one must have some unclosed tags or something, but why is the second one transcluded as plain wikitext rather than a page?
  • Then there's a second transclusion with about="#mwt4" containing Wikipedia:Poczekalnia/biografie/2020:10:13:Karol_Wilhelm_Seeliger properly
  • Other about-grouped nodes that should reference it instead refer to about="#mwt5", which does not exist
  • The remaining sections on the page do not have these problems, e.g. about="#mwt6" looks perfectly normal
<span about="#mwt4" typeof="mw:Transclusion" data-mw='{"parts":[{"template":{"target":{"wt":"Wikipedia:Poczekalnia/biografie/2020:10:15:Lucyna Kulińska","href":"./Wikipedia:Poczekalnia/biografie/2020:10:15:Lucyna_Kulińska"},"params":{},"i":0}},"\n{{Wikipedia:Poczekalnia/biografie/2020:10:13:Karol Wilhelm Seeliger}}"]}' id="mwDQ">
</span>

<section data-mw-section-id="-1" about="#mwt24" typeof="mw:Transclusion" data-mw='{"parts":["",{"template":{"target":{"wt":"Wikipedia:Poczekalnia/biografie/2020:10:13:Karol Wilhelm Seeliger","href":"./Wikipedia:Poczekalnia/biografie/2020:10:13:Karol_Wilhelm_Seeliger"},"params":{},"i":0}},"\n{{Wikipedia:Poczekalnia/biografie/2020:10:13:Delfina Ortega Díaz}}"]}' id="mwDg">
	
	<h3 about="#mwt4" id="Lucyna_Kulińska">...</h3>
	
	...
	
	<span about="#mwt4" typeof="mw:Transclusion" data-mw='{"parts":[{"template":{"target":{"wt":"Wikipedia:Poczekalnia/biografie/2020:10:13:Karol Wilhelm Seeliger","href":"./Wikipedia:Poczekalnia/biografie/2020:10:13:Karol_Wilhelm_Seeliger"},"params":{},"i":0}}]}' id="mwRQ">
	</span>

</section>

<section data-mw-section-id="-1" about="#mwt25" typeof="mw:Transclusion" data-mw='{"parts":["",{"template":{"target":{"wt":"Wikipedia:Poczekalnia/biografie/2020:10:13:Delfina Ortega Díaz","href":"./Wikipedia:Poczekalnia/biografie/2020:10:13:Delfina_Ortega_Díaz"},"params":{},"i":0}},"\n{{Wikipedia:Poczekalnia/biografie/2020:10:13:Mikołaj Czarnecki (kapłan)}}"]}' id="mwRg">
	
	<h3 about="#mwt5" id="Karol_Wilhelm_Seeliger">...</h3>
	
	...
	
	<span about="#mwt6" typeof="mw:Transclusion" data-mw='{"parts":[{"template":{"target":{"wt":"Wikipedia:Poczekalnia/biografie/2020:10:13:Delfina Ortega Díaz","href":"./Wikipedia:Poczekalnia/biografie/2020:10:13:Delfina_Ortega_Díaz"},"params":{},"i":0}}]}' id="mwbg">
	</span>

</section>

<section data-mw-section-id="-1" about="#mwt26" typeof="mw:Transclusion" data-mw='{"parts":["",{"template":{"target":{"wt":"Wikipedia:Poczekalnia/biografie/2020:10:13:Mikołaj Czarnecki (kapłan)","href":"./Wikipedia:Poczekalnia/biografie/2020:10:13:Mikołaj_Czarnecki_(kapłan)"},"params":{},"i":0}},"\n{{Wikipedia:Poczekalnia/biografie/2020:10:12:Margot}}"]}' id="mwbw">
	
	<h3 about="#mwt6" id="Delfina_Ortega_Díaz">...</h3>
	
	...
	
</section>

Try php bin/parse.php --domain pl.wikipedia.org --pageName 'Wikipedia:Poczekalnia/Zgłoszenia' --wrapSections < /dev/null and again without the --wrapSections option if you want to check this locally.

Thanks, this is neat. I tried it with the 'Wikipedia:Poczekalnia/biografie' page instead, since it has the same problem but should be simpler.

I can reproduce the incorrect behavior locally with wrapSections, and it doesn't occur without it.

Thanks for the simpler page. Will look later today.

This comment was removed by ssastry.

This is a nasty edge case (bug) in section-template conflict resolution code caused by newlines at the top of templates (ex: https://pl.wikipedia.org/wiki/Wikipedia:Poczekalnia/biografie?action=edit&veswitched=1 will have 2 newlines after the noincludes are processed). This causes the template output to start with a <span> wrapping newlines followed by a heading / section. This causes template-section conflicts and since there are multiple of them at the top-level, the effects cascade.

Effectively, the conflict resolution code clobbers previous about-id annotations instead of expanding the range of template affected code to include the original annotation. So, if the code in section wrapping code is fixed, this will effectively lead to most of the page to be wrapped in a single template wrapper. For correctness reasons, we should fix that bug in section wrapping of course.

But, maybe we need a smarter section wrapping code by relaxing some of the constraints around what a section is. i.e. instead of requiring sections to always start with headers, we could maybe include any preceding span-wrappers that wrap newlines.

If we want to be even more radical, we could align section boundaries at template boundaries, but that will break compatibility with core parser output.

So maybe for now, we can relax the section definition to include any leading newline-wrapping-spans. This will effectively eliminate the section-template conflict by preventing overlapping boundaries vs. properly nested boundaries.

Change 634672 had a related patch set uploaded (by Subramanya Sastry; owner: Subramanya Sastry):
[mediawiki/services/parsoid@master] WIP: Section Wrapping: Relax section start constraints

https://gerrit.wikimedia.org/r/634672

ssastry triaged this task as Medium priority.Oct 19 2020, 7:29 PM
ssastry moved this task from Needs Triage to Current & Upcoming Work on the Parsoid board.

Potentially similar problem on pt.wp: https://pt.wikipedia.org/?diff=59606108

Also, some time ago (I noticed this when looking for a diff for another bug): https://pt.wikipedia.org/?diff=59557215

Change 637887 had a related patch set uploaded (by Subramanya Sastry; owner: Subramanya Sastry):
[mediawiki/services/parsoid@master] WIP: Redo template-section boundary conflict code

https://gerrit.wikimedia.org/r/637887

Change 637887 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Redo template-section boundary conflict code

https://gerrit.wikimedia.org/r/637887

ssastry claimed this task.

Will go out next time we roll out new Parsoid code.

Change 634672 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Section Wrapping: Relax section start constraints

https://gerrit.wikimedia.org/r/634672

Change 646892 had a related patch set uploaded (by Subramanya Sastry; owner: Subramanya Sastry):
[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.13.0-a19

https://gerrit.wikimedia.org/r/646892

Change 646892 merged by jenkins-bot:
[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.13.0-a19

https://gerrit.wikimedia.org/r/646892

Change 646771 had a related patch set uploaded (by C. Scott Ananian; owner: Subramanya Sastry):
[mediawiki/vendor@wmf/1.36.0-wmf.21] Bump wikimedia/parsoid to 0.13.0-a19

https://gerrit.wikimedia.org/r/646771

Change 646771 merged by jenkins-bot:
[mediawiki/vendor@wmf/1.36.0-wmf.21] Bump wikimedia/parsoid to 0.13.0-a19

https://gerrit.wikimedia.org/r/646771