Page MenuHomePhabricator

Page view in non-default user interface should be translatable by Firefox (<html lang> attribute)
Open, LowPublicBUG REPORT

Assigned To
None
Authored By
sweil
Sep 28 2023, 10:09 AM
Referenced Files
Restricted File
Nov 29 2023, 11:23 PM
Restricted File
Nov 29 2023, 11:22 PM
F41547298: Screenshot 2023-11-29 at 17.20.00.png
Nov 29 2023, 11:20 PM

Description

Steps to replicate the issue (include links if applicable):

What happens?:

Now the HTML code includes unwanted attributes lang="de", for example in the html tag for the whole page, but also <div class="vector-sticky-header-context-bar-primary" aria-hidden="true" lang="de" dir="ltr">Bienvenue Stefan Weil</div>

What should have happened instead?:

The English / French / Italian / ... Wikipedia is still mainly English / French / Italian, even when I choose a different language in my user preferences. Therefore lang="de" should only by used where appropriate.

Other information (browser name/version, screenshots, etc.):

Tested with latest Firefox which now offers translations. Because of the wrong language settings in Wikipedia's HTML code that does not work as expected.

Event Timeline

Jdlrobson changed the task status from Open to Stalled.Oct 2 2023, 6:33 PM
Jdlrobson subscribed.

The user preference language is typically set globally and only overridden for the actual content e.g. #mw-content-text
When you set a preference you are mixing languages which is why Google Translate and Firefox cannot translate the page. I'm not sure what you mean by "appropriate" in "Therefore lang="de" should only by used where appropriate.". The translation doesn't work because there are 2 languages in the page so I am not sure what appropriate would be here.

The main content of the page uses the language of the selected Wikipedia, French for fr.wikipedia.org, German for de.wikipedia.org and so on. As long as the HTML tag specifies that language, translation programs will translate that content.

The user preference language is only used for the navigation menus. These are the appropriate parts that can use DIV tags with the LANG attribute of the user preference language.

That would be a significant change that would impact all skins and localization (not just Vector 2022). Tagging MediaWiki platform team who are probably best positioned to think about this.

Right, but the change would only affect users who are logged in. So it is a fix for Wikipedia authors, not for the majority of "normal" users.

Func changed the task status from Stalled to Open.Oct 3 2023, 1:57 PM

for example in the html tag for the whole page

That is subjective (is it a page in German with a content block in French, or a page in French with various menu blocks in German?). The task description offers no reason why one interpretation would be more useful than the other, and changing the code would be a major effort (instead of tagging the content block(s) with the content language we'd have to tag all the non-content blocks with the UI language) so I'd decline this.

but also <div class="vector-sticky-header-context-bar-primary" aria-hidden="true" lang="de" dir="ltr">Bienvenue Stefan Weil</div>

That seems like a bug, and probably trivial to fix. Should be a separate task though.

The task description offers no reason why one interpretation would be more useful than the other

Sorry I missed the very last sentence which actually does offer a reason:

Tested with latest Firefox which now offers translations. ... that does not work as expected.

I'd recommend filing a separate task about this, or reworking the current task so it's about this, instead of a suggested fix (see: XY problem).

When you set a preference you are mixing languages which is why Google Translate and Firefox cannot translate the page.

Google Translate, at least, can translate mixed-language pages fine, but it sometimes identifies the interface language as page language and tries to translate from that language. You can use the "Page is not in <language>" option from the ellipsis menu to fix that, but it's annoying and would be worth looking into options for improvement. But given that it doesn't always assume the page is in the UI language, it probably does something more complicated than looking at the top-level lang tag.

That is subjective (is it a page in German with a content block in French, or a page in French with various menu blocks in German?). The task description offers no reason why one interpretation would be more useful than the other, and changing the code would be a major effort (instead of tagging the content block(s) with the content language we'd have to tag all the non-content blocks with the UI language) so I'd decline this.

I don't think that it is subjective whether a page in the French Wikipedia with a user selected language for menu, side notes and footnote is French or something else. The relevant and largest part of the text will always be French. And I disagree also that it would require a major effort. It might be acceptable to change no language attributes at all when a user selects a different language. And even if a separate language attribute should be set for the affected regions, the number of such regions is very small (is it more than 3?).

Krinkle renamed this task from Wrong language settings in Wikipedia HTML code when user is logged in to Page view in non-default user interface should be translatable by Firefox (<html lang> attribute) .Nov 1 2023, 2:04 PM

Regarding vector-sticky-header-context-bar-primary, as Gergo indicates, that's a regression specific to the Vector 2022 skin and best covered in its own task.

Regarding the <html lang> attribute, I've reframed this task to focus on a problem (Firefox Translate support) rathar than a presupposed solution that must happen for its own sake.

From a quick glance, there does not appear to be anything inherently wrong, invalid, or un-semantic about the status quo. The lang="" indicates the language of a particular element, and this is inherited by default from a parent element, and the HTML specification explicilty allows for nesting and overriding portions of the page (as e.g. interpreted by accessibility tech, CSS :lang selector, and the DOM's reflection of any element's language).

From a technical perspective, the status quo is significantly easier to implement due to the content area being very confined, whereas interface elements outside of that have no pre-existing contract that requires them to be annotated. It is the skin's responsibility today to mark up the document in the user interface language, and the ParserOutput takes care of taking over for its sub-tree with the content language attribute and direction.

For the stated use case of Firefox Translate, it appears this bug is specific to Firefox and how it decides to offer or not offer translations. I can't think of a reason why they wouldn't offer translation if the page contains significant, or indeed any, amount of text in a given language.

As a point of comparison, if we use Google Translate with the following URL:

https://fr.wikipedia.org/wiki/Archag_Tchobanian?uselang=en

.. which contains French content wrapped in an English-language user interface container, and ask it to translate to Dutch, it will do so, and it will translate both the content from French and the UI from English, both to Dutch:

https://fr-m-wikipedia-org.translate.goog/wiki/Archag_Tchobanian?uselang=en&_x_tr_sl=auto&_x_tr_tl=nl

Unless we can find other notable software that is unable to interpret this kind of standard HTML document, I suggest starting by reporting this to Firefox as a bug and seeing what they say. Taking into account that this Firefox feature is less than a year old and only out of beta since a few months, it's quite likely they may not have encountered this use case until now. For all we know it's a trivial thing to fix!

Upstream: https://bugzilla.mozilla.org/buglist.cgi?product=Firefox&component=Translation&bug_status=__open__

Then let me rephrase my bug report: Wikipedia uses incorrect language attributes if a non-default user interface is selected.

I see several bugs on the Wikipedia side:

  1. The language attribute is wrong for the vector-sticky-header-context-bar-primary case.
  2. The language attribute is wrong for the menus which typically contain a mixture of translated and untranslated text. For example the main menu of the main page of the French Wikipedia with German user interface https://fr.wikipedia.org/wiki/Wikip%C3%A9dia:Accueil_principal?uselang=de has some German texts, but also "Thematic portals", "contacts" (English) and "Contribuer" (French). Maybe the real cause are missing translations, but as long as translations are missing, claiming that the text is German (or another selected language) is obviously wrong.
  3. I was surprised to see that my user preference (German user interface) and using uselang=de (which I did not know up to now) give different results. The first one shows some really bad "German" texts ("Hauptmen.", "Umschalt") while the second variant does not have those errors.

Regarding Firefox, they decided to offer translations based on the main language attribute (the one for the html tag). In addition Firefox has a menu entry which always offers translations, but requires the user to select source and target language. Their current implementation does not support translations from several source languages to a single target language.

Krinkle removed Krinkle as the assignee of this task.EditedNov 29 2023, 11:20 PM
Krinkle triaged this task as Low priority.
Krinkle subscribed.

I understand they appear related from end-user perspective, but you're raising about 5 different observations that have little to no technical relation between them from a software perspective. I assigned myself to this task to understand and investigate this task. I am now un-assigning myself, as this investigation is done. (Prioritising and fixing is up to someone else. Although per my next comment, this may not be possible.). This task is (to me) only about the newly reported issue around Firefox translation. The other points are known issues that are already (or can be) reported as their own other tasks.

My conclusion is that MediaWiki is meeting the expectations of the HTML standard, and up until the recently launched Firefox Translate, all other web translation software I've seen handles this without issue. Firefox's built-in translation is very new, and it's not surprising that some less common scenarios are not yet handled correctly. In this case, the "uncommon" scenario is for someone to 1) browse a Wikipedia edition in a language that you want translation for, 2) also have an account on that same wiki, and 3) also have a non-default language setting in the account preference. All three need to be true to excercise this bug in Firefox.

Having said that, I am unable to reproduce this bug. In a clean profile of Firefox 119 on macOS, I tried the following:

  1. https://fr.wikipedia.org/wiki/Wikip%C3%A9dia:Accueil_principal
  2. Click translate icon in the address bar. Presents "from" and "to" choices, defaulting as French to English. Works as expected (UI and content are localised).
  3. https://fr.wikipedia.org/wiki/Wikip%C3%A9dia:Accueil_principal?uselang=de
  4. Click translate icon in address bar. Default choice is now German to English. Works as expected (UI is localised, content unchanged)
  5. Click the same translate icon, set "from" to French. Works as expected (content is localised, UI unchanged).

My understanding from the previous comments was that, to be able to translate the French content at the uselang=de URL, that one would need to go into advanced settings, or enable "Always translate", or otherwise do something outside the "translate icon" that is promoted by Firefox in the addres bar, and the simple "from" and "to" menus presented there. However, at least today, this appears not to be the case.

{F41547305 height=200}
Screenshot 2023-11-29 at 17.20.00.png (1×2 px, 651 KB)

Is this different from your experience?

If it were trivial to change this in MediaWiki, or if it didn't have significant other benefits in its current form, I would recommend a change in MediaWiki core to set a different HTML attribute. But, I can't recommend that currently. Doing so would break two decades of assumptions about our software architecture and the expectations and responsibilities of specific components and MediaWiki skin/extension developers, indicating that it is safe to output UI text in the outer shell of the document, and that it is safe to output content text in the content area of the document.

The document reflects the language that the outer ("first" or "skin") layer of information is localised in, and the content area is a child of that, and is correcly annotated as such when the two languages differ. I suppose one "hack" we could try is set the content language on the <html> element but then immediately undo that on the <body> element to preserve correct interpretation of all the skin and UI information.

However, I'd prefer to hear from upstream Mozilla first what their expectations are in this regard. I was about to create an upstream bug report on bugzilla.mozilla.org, but then I noticed in trying to reproduce the issue today, it appears to work as expected (see screenshots in previous comment).

My conclusion is that MediaWiki is meeting the expectations of the HTML standard, and up until the recently launched Firefox Translate, all other web translation software I've seen handles this without issue.

As noted above, the issue definitely exists in Google Translate, although it feels somewhat random and I'm pretty sure it's not (or not only) based on the top-level language attribute. Someone could experiment by changing the lang property with a gadget and seeing how that affects translations.

Firefox's built-in translation is very new, and it's not surprising that some less common scenarios are not yet handled correctly.

I'm not sure there's a clear "correct" and "incorrect" here. If a page has menus and other skinning in language A and content in language B, and the user requests translation to C, should the browser apply an A->C or B->C translation? And how should it know what's menu and what's content? (Semantic HTML could help but ours isn't.) I guess the ideal version would be translating every piece that's in a different language separately, so you end up with both menus and content in language C, but I don't think any translation system capable of that exists today.

In this case, the "uncommon" scenario is for someone to 1) browse a Wikipedia edition in a language that you want translation for, 2) also have an account on that same wiki, and 3) also have a non-default language setting in the account preference.

It's not actually that uncommon for Wikipedia editors; I encounter this issue regularly with Google Translate. You'd set a global language preference because navigating (especially on wikis with an unfamiliar script) is just very hard without that, but then you want to use a translator for the content.

FWIW there's a translate attribute which can disallow translation of part of a document; we could maybe try to apply that to the menus when they aren't in content language and see if it helps (my guess is still that browsers use AI language detection so which language has more translatable test matters). But I definitely agree that talking to the people who built the translator should be the first step.