Page MenuHomePhabricator

By section editing on translatable page, a blank line is added after <translate> tag
Open, Needs TriagePublicBUG REPORT

Description

List of steps to reproduce (step by step, including full links if applicable):

  • Open translatable page
  • Click "edit" (or "edit source") link next to a subheading
  • Click "save"

What happens?:
By editing "Section 1", a blank line is added after <translate> tag of "Section 2".

<translate>
== Section 1== <!--T:1-->
</translate>
<translate><!--T:2--> Test1</translate>

{{anchor|Section 2}}
<translate>

== Section 2 == <!--T:3-->
</translate>

What should have happened instead?:
No blank lines are added.

<translate>
== Section 1== <!--T:1-->
</translate>
<translate><!--T:2--> Test1</translate>

{{anchor|Section 2}}
<translate>
== Section 2 == <!--T:3-->
</translate>

Software version (if not a Wikimedia wiki), browser information, screenshots, other information, etc:

Event Timeline

Nikerabbit subscribed.

The Translate extension has no control over this. It sounds like the edit section code automatically adds an empty line before the next section, if missing?

Tacsipacsi subscribed.

I think it’s rMW includes/parser/Parser.php:5756 (at e6ca2a56cd78). Adding MediaWiki-Parser so that the Parsing team has a chance finding this task.

Just wanted to +1 this task and say that I encounter this issue a lot

Normalizing sections spacing by making sure they have an empty line between them has been there since the early days and saves a lot of effort. You have to deliberate the optimal strategy here. Could it be to normalize <translate> tags instead by using this format?

Text
<translate>

== Section==
</translate>

(Note: "Text" in this case should not have an empty line after to avoid an empty paragraph after itself.)

Alternatively, could the parser's behavior here be adjusted by some extraneous factor? E.g. Wiktionary also has an established style in nearly 1 million entries where they have no empty line after the lead section (see).

As mentioned above, there are cases where Template:anchor is placed before a section heading.

{{anchor|section 2}}
<translate>

== Section 2 == <!--T:3-->
</translate>

If there is a blank line, a new section may be added there, which can cause a discrepancy with the anchor (this has actually happened).

{{anchor|section 2}}
<translate>
== Section 1.5 ==

Text 1.5

== Section 2 == <!--T:3-->
</translate>

@Shirayuki I assume the argument is that people will be even less likely to notice it with an empty line? Well, adding a section anchor to the end of the previous section is a hack in itself. I think people use it to avoid cluttering the section heading with a template (which would end up in edit summaries which is T69068, but this incidentally doesn't even happen on pages with Translate because MediaWiki can't handle HTML comments like <!-- T:1 --> which is T62123 and section titles don't get into edit summaries at all).

There should be a way to have static section links without resorting to this, especially if, AFAIU, this is essential to pages with Translate where you don't have static headlines because each language has its own? If yes, then, to be honest, the {{anchor}} placing at all seems like a hack which should better be solved by correct section redirecting from Special:MyLanguage/ links, based on the English translation or something. Correct me if I'm missing something.

@Shirayuki I assume the argument is that people will be even less likely to notice it with an empty line? Well, adding a section anchor to the end of the previous section is a hack in itself. I think people use it to avoid cluttering the section heading with a template (which would end up in edit summaries which is T69068, but this incidentally doesn't even happen on pages with Translate because MediaWiki can't handle HTML comments like <!-- T:1 --> which is T62123 and section titles don't get into edit summaries at all).

There should be a way to have static section links without resorting to this, especially if, AFAIU, this is essential to pages with Translate where you don't have static headlines because each languages has its own? If yes, then, to be honest, the {{anchor}} placing at all seems like a hack which should better be solved by correct section redirecting from Special:MyLanguages/ links, based on the English translation or something. Correct me if I'm missing something.

Yes, {{anchor}} was mainly used as a workaround of T62544, but it is still used because of followup bug T333914.

There should be a way to have static section links without resorting to this, especially if, AFAIU, this is essential to pages with Translate where you don't have static headlines because each languages has its own? If yes, then, to be honest, the {{anchor}} placing at all seems like a hack which should better be solved by correct section redirecting from Special:MyLanguages/ links, based on the English translation or something. Correct me if I'm missing something.

Especially on MediaWiki.org, because it is linked from various external sites, it is necessary to retain previous section names and abbreviations with {{anchor}}.
Since the Translate extension now automatically adds anchors for section names in the English version, most {{anchor}}s are unrelated to translations.

the Translate extension now automatically adds anchors for section names in the English version

As far as I can see from this example, it adds them above the heading markup just like anchors are added, e.g.

<span id="Features"></span>
== Funktionen ==

So if "Features" is the first section on the page and I link it having old Vector as my skin, the link targets the table of contents instead. This feels like a huge hack, this time on part of the Translate extension. It in fact creates an empty paragraph (which is simply unsemantic HTML) in a wrong place and puts an anchor in it.

image.png (168×721 px, 12 KB)

It's hard to say what's the best way to address this, I was actually thinking of using backend's knowledge of section correspondence between languages to take care of redirects from Special:MyLanguage/. For example:

  1. I get a link to https://commons.wikimedia.org/wiki/Special:MyLanguage/User:Jack_who_built_the_house/Convenient_Discussions#Features
  2. Special:MyLanguage fetches the name of the "Features" section in my language, for example German.
  3. I'm redirected to the correct anchor name, i.e. https://commons.wikimedia.org/wiki/User:Jack_who_built_the_house/Convenient_Discussions/de#Funktionen.

I'm not sure how easy it would be for the backend to do that, but it would definitely be less hacky than adding HTML anchors to the content of pages. And leaving things as they are will just increase technical debt for the future.

As for {{anchor}}s. One could argue that even adding them to the section markup, i.e. == {{anchor|section1}} Section 1 ==, is suboptimal and some better way to ensure static section linking in MediaWiki should be invented (some extension of the markup, e.g. == Section 1 == #section1?). But I doubt such a task would be prioritized and aim for more near-term solutions. I presume the reason markup like == {{anchor|Features}} Funktionen == is not used is to have clean lines for translators? Maybe templates should be bypassed by the extension...

Wait, the Translate extension has its tags, right? Then it's unclear to me what's the problem to introduce a tag like <tanchor> and use it like this:

== <tanchor="section1">Section 1</tanchor> ==

or

== <tanchor="section1" />Section 1 ==

Translators will get "Section 1" to translate, and the extension will add <span id="section1">...</span> in a proper place.

  1. Special:MyLanguage fetches the name of the "Features" section in my language, for example German.
  2. I'm redirected to the correct anchor name, i.e. https://commons.wikimedia.org/wiki/User:Jack_who_built_the_house/Convenient_Discussions/de#Funktionen.

I'm not sure how easy it would be for the backend to do that

Impossible. Section links are a client-side thing; when you open https://commons.wikimedia.org/wiki/Special:MyLanguage/User:Jack_who_built_the_house/Convenient_Discussions#Features, the browser starts the HTTP request with

GET /wiki/Special:MyLanguage/User:Jack_who_built_the_house/Convenient_Discussions HTTP/2

i.e. the anchor is entirely omitted. It would be possible to check the anchor in the URL after load using JavaScript (JS does have access to the anchor), but that wouldn’t work for users without JavaScript.

Translators will get "Section 1" to translate, and the extension will add <span id="section1">...</span> in a proper place.

The problem is what the “proper place” is.

  • Currently there are no constraints on how the translation looks like. Of course, the reasonable translation of == Example == into German is == Beispiel ==, but nothing stops the translator from translating it as {{Lorem ipsum}}. Where would there be the appropriate place?
  • What if the English page looks like == Ex<tanchor="example" />ample ==?
  • Even if everything looks like you except it, i.e. == <tanchor="example" />Example == translated as == Beispiel ==, where would that appropriate place be? I don’t think it’s appropriate within the anchor itself, as it may confuse tools parsing the content of the <span class="mw-headline">.

The latter two points may be solved by inventing a new attribute for <h#>s: e.g. one would write <h2 data-extra-ids="example">"Example"</h2> on the English page, and the German translation would be generated as <h2 data-extra-ids="&quot;Example&quot; example">„Beispiel“</h2>, resulting in <span>s with IDs "Example", example, .E2.80.9EBeispiel.E2.80.9C and „Beispiel“ (and maybe also .22Example.22). This is probably something for when Parsoid is the only parser, though, and I’m also not sure how the translation interface would look like. (The best solution would be if translators didn’t see any heading syntax, yet Translate would be aware of it being a heading, including that it continued to format it like a heading in the “page” view of Special:Translate.)

Impossible. Section links are a client-side thing; when you open https://commons.wikimedia.org/wiki/Special:MyLanguage/User:Jack_who_built_the_house/Convenient_Discussions#Features, the browser starts the HTTP request with

GET /wiki/Special:MyLanguage/User:Jack_who_built_the_house/Convenient_Discussions HTTP/2

i.e. the anchor is entirely omitted. It would be possible to check the anchor in the URL after load using JavaScript (JS does have access to the anchor), but that wouldn’t work for users without JavaScript.

Right, right. OK, let's think about the best placement of anchors.

  • Currently there are no constraints on how the translation looks like. Of course, the reasonable translation of == Example == into German is == Beispiel ==, but nothing stops the translator from translating it as {{Lorem ipsum}}. Where would there be the appropriate place?

(As far as I understand, this freedom of translating section headings is unintentional?)

  • What if the English page looks like == Ex<tanchor="example" />ample ==?

That should be irrelevant, since tags should just be removed from the translated part (and translation) and just remembered to be there?

  • Even if everything looks like you except it, i.e. == <tanchor="example" />Example == translated as == Beispiel ==, where would that appropriate place be? I don’t think it’s appropriate within the anchor itself, as it may confuse tools parsing the content of the <span class="mw-headline">.

Well, if we had access to the backend of section heading rendering, then this is already done for compatibility with old section IDs after HTML5 section IDs arrived (T152540), for example https://en.wikipedia.org/wiki/2024_New_Caledonia_unrest#%22Frozen%22_electorate:

image.png (108×669 px, 9 KB)

But I do think that putting the anchor inside <span class="mw-headline"> is a huge improvement already. As a developer of a tool which occasionally suffers from changes in the composition of .mw-headline, I think tools should be generally OK with a stray tag without any textual content – if they weren't used to such already. E.g. go to https://en.wikipedia.org/wiki/Project:Manual_of_Style and run $('span[id] span[id]') in the browser console.

The latter two points may be solved by inventing a new attribute for <h#>s: e.g. one would write <h2 data-extra-ids="example">"Example"</h2> on the English page, and the German translation would be generated as <h2 data-extra-ids="&quot;Example&quot; example">„Beispiel“</h2>, resulting in <span>s with IDs "Example", example, .E2.80.9EBeispiel.E2.80.9C and „Beispiel“ (and maybe also .22Example.22). This is probably something for when Parsoid is the only parser, though

(I already saw a similar suggestion in T353489#9408364)

{{#h|<num>|contents|attrib=value|attrib=value}}

It is probably too broad to change parsing behavior in core just for this bug.

Perhaps the fix here is for the Translate extension to use a hook, read the $_POSTed wikitext after core transforms it, check if $isSectionEdit && $wikitextEndsInTranslateTag (could use a regex such as /<translate>\s+$/), and if so run trim(). The hook would need to do this after the MediaWiki Parser does its line break additions but before the wikitext is saved to the database. I think this algorithm would handle the most common use case on mediawiki.org.