Page MenuHomePhabricator

Problems with adding <div class="mw-translate-fuzzy"> around headings
Open, MediumPublic

Assigned To
None
Authored By
matmarex
Dec 21 2023, 9:41 AM
Referenced Files
F41615774: image.png
Dec 21 2023, 9:41 AM
F41615761: image.png
Dec 21 2023, 9:41 AM
F41615759: image.png
Dec 21 2023, 9:41 AM
F41615757: image.png
Dec 21 2023, 9:41 AM

Description

Adding any wrapper elements around headings in MediaWiki interferes with its handling of sections, and the <div class="mw-translate-fuzzy"> wrappers for outdated content – and <div lang="en" dir="ltr" class="mw-content-ltr"> for untranslated content – are no exception. The issues aren't too serious, but seemed worth writing up.

(Also, wrapper elements around headings complicate the code for adding <section> wrappers around sections in Parsoid, sometimes leading to weird effects like the table of contents displaying in the wrong place, so I wouldn't want to recommend that.)

You may want to chat with Language-Team folks about this, because this is exactly what Translate does when the translation of a heading is outdated (e.g. https://test.wikipedia.org/wiki/Wikipedia:Requests/Tools/hi). It doesn’t look broken with Parsoid, though.

I set up a test page with a fuzzy heading: https://test.wikipedia.org/wiki/Translate_headings/pl

image.png (2×3 px, 389 KB)

It looks okay with the default skin and settings, but there are bugs in other cases:

1. With the old Vector skin (or any other skin that uses TOC inside the body), the TOC is placed directly before the first heading, inside the fuzzy marker: https://test.wikipedia.org/w/index.php?title=Translate_headings/pl&useparsoid=0&useskin=vector

image.png (2×3 px, 422 KB)
image.png (2×3 px, 737 KB)

For some reason this doesn't happen with Parsoid, but I think that's just a lucky coincidence.

2. With Parsoid, the section wrappers are all messed up, marking up multiple uneditable sections and pseudo-sections (https://www.mediawiki.org/wiki/Parsing/Notes/Section_Wrapping#Pseudo-sections) indicated by data-mw-section-id="-1" and data-mw-section-id="-2". https://test.wikipedia.org/w/index.php?title=Translate_headings/pl&useparsoid=1

This is not a problem right now, since section editing for translation pages is not available anyway, but it could cause problems in the future with any tool trying to make sense of the section wrapping.

<section data-mw-section-id="-1" id="mwAQ"><p id="mwAg"><link typeof="mw:Extension/languages" about="#mwt2" data-mw="{&quot;name&quot;:&quot;languages&quot;,&quot;attrs&quot;:{}}" id="mwAw"></p>
<div lang="en" dir="ltr" class="mw-content-ltr" id="mwBA">
<p id="mwBQ">Lede</p>
</div>

<p id="mwBg"><span id="A_2"></span></p>
<meta property="mw:PageProp/toc" data-mw="{&quot;autoGenerated&quot;:true}" id="mwBw"></section><section data-mw-section-id="-2" id="mwCA"><div class="mw-translate-fuzzy" id="mwCQ">
<section data-mw-section-id="-1" id="mwCg"><h2 id="A_pl">A pl</h2>
</section></div>

<p id="mwCw"><span id="B"></span></p>
</section><section data-mw-section-id="2" id="mwDA"><h2 id="B_pl">B pl</h2>

<p id="mwDQ"><span id="C"></span></p>
</section><section data-mw-section-id="-1" id="mwDg"><h2 id="C_pl">C pl</h2>

</section><section data-mw-section-id="-1" id="mwDw"><div lang="en" dir="ltr" class="mw-content-ltr" id="mwEA">
<section data-mw-section-id="-1" id="mwEQ"><h2 id="D">D</h2>
</section></div>

</section><section data-mw-section-id="-2" id="mwEg"><div lang="en" dir="ltr" class="mw-content-ltr" id="mwEw">
<section data-mw-section-id="-1" id="mwFA"><h2 id="E">E</h2>
</section></div></section>

3. On the mobile site, the sections are not collapsible. The untranslated sections have the same problem. https://test.m.wikipedia.org/wiki/Translate_headings/pl

image.png (2×3 px, 232 KB)

(4. I also remember that it used to cause issues with TOC highlighting on Vector 2022, but I can't reproduce that now – either it's fixed or a more complex page is needed.)


Most of these problems would probably be fixed by placing the markup inside the <h2> etc. instead.

Event Timeline

Traditionally Translate, similarly to the good old legacy parser, works with chunks of text. It knows nothing about the DOM, and very little about wikitext, giving much room for translation administrators to split up pages to translation units however they like (and shoot themselves on foot by, for example, putting the first half of a huge wikitable in one translation unit and the other half in the next one – no <div> wrapping will be able to handle this). Luckily, one of the few things it knows about wikitext (since T62544) is how headings look like (for which it partly reimplements the core parser). However, if your advice is followed and the markup is inside the <h2>, there are two possibilities:

  • Use a <span>. This looks bad: now only the text itself is highlighted, not the whole heading.
  • Use a <div>. This probably looks the same as before (tested on Vector 2010, Vector 2022 and Timeless), but means that the <div> is a child element of a <span> generated by the core parser, i.e. a block element is within an inline element.

Maybe the class should be put on the heading element rather than inside it, resulting in <h2 class="mw-translate-fuzzy">. Which is probably difficult and error-prone to do entirely in the extension, calling out to Parsoid would be safer.

Also, all this assumes that the wikitext looks how it should. Of course, it won’t. There are many ways for it not to look like how it should (e.g. by involving templates), but a pretty common (although not recommended) markup is the following:

<translate>
<!--T:1-->
== Lorem ipusm dolor sit amet ==
Nobis libero iure eveniet ad sunt qui minima et. Excepturi consequatur cupiditate non dolorum esse. Quaerat saepe consequatur assumenda deleniti exercitationem voluptatem placeat. Non quaerat aut qui quod. Ducimus quia est et ea ut dolore corrupti. Quae ipsam eveniet non. Sed autem et placeat. Qui numquam optio quasi assumenda error explicabo sint et. Laborum laboriosam sint distinctio id et. Ipsam vero beatae eum enim aut qui. In dolores sit sint dignissimos commodi quis optio. Itaque et voluptas explicabo qui.

<!--T:2-->
Ad et consequatur facilis veritatis et. Maxime maiores esse aut voluptas ut doloremque. Rerum odio harum et autem asperiores. Id doloremque sint et ut laborum est et. Sit aliquam error excepturi facilis similique. Neque perspiciatis similique rerum dolorem. Explicabo et omnis cumque consectetur quisquam animi nihil ut.
</translate>

i.e. the heading and the first paragraph build a common translation unit. In this case, since the <div> wraps a single translation unit, we can’t do anything but put that opening <div> before the heading.


Speaking of T62544, another thing I noticed in your HTML excerpt is that the <span>s introduced by that task are parts of the preceding <section>s. This is probably why opening https://test.m.wikipedia.org/wiki/Translate_headings/pl#C uncollapses B pl instead of C pl.

This almost feels like we need a wikitext extension to produce more complicated headers. Perhaps a parser function where we can specify additional classes? Although with some hacky code we could also produce <h2 class="mw-translate-fuzzy">foo</h2> from Translate, I am not sure if it will result in a good normal header with the parser/parsoid.

Regarding the poor mark-up example from @Tacsipacsi, I think we could make our translation section parser more aggressive and force header and following paragraph to be separate units. That would remove some flexibility from users, but it might be for the better.

I also thought about a {{#HEADING-ANCHOR|}} parser function for T62544 and followups, but that would require a new core hook to edit the heading.
We may want to mix both parser functions with something like {{#HEADING-EDIT|anchor=|class=}}.