Page MenuHomePhabricator

Handle duplicate heading names when generating anchors
Open, MediumPublic

Description

As part of T62544 the Translate extension is generating anchors for the source language definition on translation pages.

If there are two headings with the same content in the source translatable page, then the extension generates two spans with the same ID in the translation page as well.

The Parser adds a suffix with the heading number in the ID to avoid the duplication

Event Timeline

Actually, there are three possibilities:

  • Two normal ==Heading==s conflict – handled by the parser.
  • Two <span>s generated by us conflict – this is what you scoped this task to.
  • A normal ==Heading== conflicts with a <span> generated by us – what about this?

Even worse, it may happen that an ID generated by the parser to resolve a conflict ends up conflicting with our ID, like in the following case:

Original
<translate>
== Foo 2 == <!--T:1-->

== Foo == <!--T:2-->
</translate>
Translation
<span id="Foo_2"></span>
<div class="mw-translate-fuzzy">
== Foo ==
</div>

<div class="mw-content-ltr" lang="en" dir="en">
== Foo ==
</div>

Since ==Foo== appears twice in the translation, the parser-generated ID of the second one will be Foo_2, just like the ID we generate for the first one. This even breaks the TOC, as it’ll contain a link to #Foo_2 for the second heading, but the browser will (likely) jump to the first anchor it finds with the given ID.

(I tried to create a somewhat realistic example. The story behind it: T:1 used to be ==Foo==, but then the heading was changed to ==Foo 2==, fuzzying the translation. Later a new heading (T:2) was added with the text ==Foo==, but since many languages have already updated the translation of T:1 to match ==Foo 2==, the translation admin decided to add a new translation unit for ==Foo==, and not for ==Foo 2==. The translation we’re looking at, however, hasn’t been updated in the last six years, so it still contains ==Foo== as the translation of ==Foo 2==.)