Page MenuHomePhabricator

Language and direction of first heading should depend on page content language instead of user interface language
Open, MediumPublic

Description

At the moment the language and the direction of the first heading depends on the user interface language. The first heading is part of the page content and should depend on the page content language:

<h1 id="firstHeading" class="firstHeading mw-content-{dir}" dir="{dir}" lang="{lang}">


Version: unspecified
Severity: normal
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=2865
https://bugzilla.wikimedia.org/show_bug.cgi?id=63880

Details

Reference
bz34514
Related Gerrit Patches:

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 12:12 AM
bzimport set Reference to bz34514.
bzimport added a subscriber: Unknown Object (MLST).
Fomafix created this task.Feb 19 2012, 2:21 PM

While your proposal was more or less my initial idea and makes sense, we chose for the following:
<h1 id="firstHeading" class="firstHeading"><span dir=""> because this will keep the title at a consistent place, i.e. the side depending on the user interface language direction. Secondly, we did not use dir="ltr/rtl" because for some pages the title is in a different language (or mixed) than the page content language (e.g. MediaWiki namespace pages, usernames, ...). Instead, we (Amir in r105854 / r105870) use dir="auto" which detects the direction. It's currently only supported by Chrome but it will be soon in e.g. Firefox.

Because it is at a consistent place (based on UI) it can be flipped normally so it doesn't need mw-content-{dir}

So we could WONTFIX this bug, if you agree.

(Relevant bugs: bug 34470, bug 31738, bug 32403)

I think the first heading is part of the content and should have the same language and the same direction as the page content language. Do you agree?

As consequence the first heading "MediaWiki:Help/de" on
https://translatewiki.net/wiki/MediaWiki:Help/de?uselang=en
should be left aligned and the first heading "MediaWiki:Help/ar" on
https://translatewiki.net/wiki/MediaWiki:Help/ar?uselang=en
should be right aligned. Like in the body.

When there is a foreign RTL title in a LTR wiki or a foreign LTR title in a RTL wiki the correct direction can be set by DISPLAYTITLE.

Example: If you want to write an article in arwiki for the alternative metal band from Northern Ireland [[Therapy?]] with the original title "Therapy?" you should use:
{{DISPLAYTITLE:<span lang="en" dir="ltr">Therapy?</span>}}

The first heading is right aligned because the page content language for all pages in arwiki is RTL, but the text itself is LTR so it doesn't generate "?Therapy" independent of the user interface language:
<h1 id="firstHeading" class="firstHeading mw-content-rtl" dir="rtl"
lang="ar"><span lang="en" dir="ltr">Therapy?</span></h1>

This solution also solves Bug 31738.

The first heading in the MediaWiki namespace isn't in the page content language (and even not in the user interface language). So a automatic DISPLAYTITLE should be generated: For example [[MediaWiki:Help/ar]] should contain:
<h1 id="firstHeading" class="firstHeading mw-content-rtl" dir="rtl"
lang="ar"><span lang="en" dir="ltr">MediaWiki:Help/ar</span></h1>

When the first heading contains more than the title there are nested language and direction container necessary:

https://translatewiki.net/w/i.php?title=MediaWiki:Help/ar&action=edit&uselang=de

should contain:

<h1 class="firstHeading mw-content-rtl" id="firstHeading" lang="ar" dir="rtl"><span lang="de" dir="ltr">Quelltext von Seite <span dir="ltr" lang="en">MediaWiki:Help/ar</span> ansehen</span></h1>

https://ar.wikipedia.org/w/index.php?title=%D9%85%D9%83%D8%A9&action=edit&uselang=de

should contain:

<h1 class="firstHeading mw-content-rtl" id="firstHeading" lang="ar" dir="rtl"><span lang="de" dir="ltr">Bearbeiten von „<span lang="ar" dir="rtl">مكة</span>“</h1>

The fictitious article [[Therapy?]] on arwiki with original English title "Therapy?" and
{{DISPLAYTITLE:<span lang="en" dir="ltr">Therapy?</span>}}

https://ar.wikipedia.org/w/index.php?title=Therapy%3F&action=submit&uselang=de

should contain:

<h1 class="firstHeading mw-content-rtl" id="firstHeading" lang="ar" dir="rtl"><span lang="de" dir="ltr">Bearbeiten von „<span lang="ar" dir="rtl"><span lang="en" dir="ltr">Therapy?</span></span>“</h1>

Assigning "LOWEST" priority since SPQRobin thinks this shouldn't be done.

This needs to be done. Content language varies on multilingual wikis and hence the titles should be considered part of the content. For example problem cases:
Viewing hi-wp's mainpage with uselang hi (default) [1] and uselang en [2] clearly shows differences and the diacritics get cut. This means that the fix for Bug 32826 depends on the uselang and remains essentially unimplementable on multilingual wikis like oldWS or Commons (see Bug 35430 ). For single-lang wikis it won't work properly unless the lang attribute is specified in the title explicitly.

[1]: http://hi.wikipedia.org/wiki/%E0%A4%AE%E0%A5%81%E0%A4%96%E0%A4%AA%E0%A5%83%E0%A4%B7%E0%A5%8D%E0%A4%A0
[2]: http://hi.wikipedia.org/wiki/%E0%A4%AE%E0%A5%81%E0%A4%96%E0%A4%AA%E0%A5%83%E0%A4%B7%E0%A5%8D%E0%A4%A0?uselang=en

Increasing importance since it is essentially very annoying not being able to read titles in some langs at multilingual wikis.

Siddhartha Ghai is quite right, and to generalize it, i'd say that what is really needed is fine-grained control on the HTML language attribute (lang) on every content element of the page. This affects line-height of Indic languages [1], this affects directionality, and this may also affect glyph shaping, quoting, hyphenation and other things for some languages.

Currently, the heading is more or less a blob, which can hardly be controlled. It definitely should be done, and it will probably be easy in the Visual Editor age, but thinking about this should start as early as possible.

[1] Actually it shouldn't affect line-height, but imperfections in implementations of font rendering force us to use CSS tricks to fix it.

A possible solution is to remove the <span dir="auto"> from r105854/r105870 in skin/* again and add lang and dir to every message for setPageTitle() when necessary:

  • The title contains only a message in user interface message, use: setPageTitle( wfMessage( 'message' ) ) This generates <h1 id="firstHeading" class="firstHeading">Nachricht</h1> The title has the alignment, the direction and the language like the user interface.
  • The title contains only the page title $pageLang = $title->getPageViewLanguage(); setPageTitle( Html::rawElement( 'div', array( 'lang' => $pageLang->getHtmlCode(), 'dir' => $pageLang->getDir(), 'class' => 'mw-content-'.$pageLang->getDir() ), $title->getPrefixedText() ) This generates <h1 id="firstHeading" class="firstHeading"><div lang="ar" dir="rtl" class="mw-content-rtl">مكة</div></h1> The title has the alignment, the direction and the language like the page content.
  • The title contains the page title in a user language message $pageLang = $title->getPageViewLanguage(); setPageTitle( wfMessage( 'editing', Html::rawElement( 'span', array( 'lang' => $pageLang->getHtmlCode(), 'dir' => $pageLang->getDir(), 'class' => 'mw-content-'.$pageLang->getDir() ), $title->getPrefixedText() ) This generates <h1 id="firstHeading" class="firstHeading">Bearbeiten von „<span lang="ar" dir="rtl" class="mw-content-rtl">مكة</span>“</h1> The title has the alignment, the direction and the language like the rest of the user interface, but the title has the direction and the language like the page content.

To improve this there should be a central function to get the page title language and direction and generate the div/span container.

Incomplete patch

This is a working but incomplete patch to demonstrate alignment language and direction of the title based on the page content language.

attachment patch ignored as obsolete

Fomafix: Thanks for your patch!

You are welcome to use Developer access

https://www.mediawiki.org/wiki/Developer_access

to submit this as a Git branch directly into Gerrit:

https://www.mediawiki.org/wiki/Git/Tutorial

Putting your branch in Git makes it easier for us to review it quickly. Please also mention there that it's incomplete and needs improvement.
Thanks again! We appreciate your contribution.

As you said, there should indeed be a central function to get the page title
language and direction and generate the div/span container. Otherwise the code, as it currently is in the patch, would get unmaintainable.

The title does not need mw-content-ltr/rtl classes since that class is intended for the *content* (list alignment, etc) and is useless for titles.

I am also not sure if we should remove span dir="auto".

We definitely should remove dir="auto", but only after we have fine-grained indication of the language of the title and the page content.

See also bug 2865. I submitted a patch for it, but Jenkins says that it fails the tests for some unclear reason.

Revised patch

Revised patch:

  • Use <span> instead of <bdi> because Internet Explorer 8 doesn't support <bdi>
  • Fix Bug 35430
  • Remove mw-content-ltr/mw-content-rtl

Missing things:

  • Central function for generating <span>/<div> container.
  • Central function for getting page title language. Page title language may differ from page content language for example in NS_MEDIAWIKI.

Change-Id: I55dd392dbf0d4f768344b66e6ceb3a023b1d5c9b is a wrong solution, because it generates wrong language tags for system messages in user interface language: https://bugzilla.wikimedia.org/show_bug.cgi?id=2865#c8

attachment bug34514.patch ignored as obsolete

Revised patch

  • New function getHtmlPrefixedText to generate a HTML container for the page title
  • Add more positions where the page title is set
  • Remove !important

Attached:

The patch also solves Bug 32403.

I guess that somebody needs to transfer the patch to Gerrit so it can get a review...
Fomafix: Too much hassle to get a Gerrit account, or what are the reasons?

(In reply to comment #7)

Siddhartha Ghai is quite right, and to generalize it, i'd say that what is
really needed is fine-grained control on the HTML language attribute (lang)
on every content element of the page...

I agree with you.

This relates to another problem that I found with i18n-ed messages from the MediaWiki namespace a while ago. When fallback language messages are being used, they appear as if they were in the wiki or user language. Though a minor problem most of the times, that is technically incorrect.

(In reply to comment #11)

As you said, there should indeed be a central function to get the page title
language and direction and generate the div/span container.

div/span containers with proper attributes are needed various places, so they are a very basic sort of functionality.

I wonder, if we no should make them available in the wikitext context as well. We already have something like {{lang|CODE|content}} as a template in very many wikis.

Having only one central place for making those containers has advantages:

  • less efford for editor
  • easy maintenance
  • possible to avoid unnecessary inner containers when language context is known at runtime, while text writers / template coders may be unable to know it.

Shall we open an extra bug for this?

(In reply to Fomafix from comment #15)

Created attachment 11406 [details]
Revised patch

  • New function getHtmlPrefixedText to generate a HTML container for the page

title

  • Add more positions where the page title is set
  • Remove !important

Tried to upload via http://tools.wmflabs.org/gerrit-patch-uploader/ :

1 out of 1 hunk ignored
patching file includes/Title.php
Hunk #1 succeeded at 1288 (offset 132 lines).
patching file includes/actions/HistoryAction.php
Hunk #1 succeeded at 53 (offset 5 lines).
patching file includes/actions/InfoAction.php
Hunk #1 succeeded at 758 (offset 165 lines).
patching file includes/diff/DifferenceEngine.php
Hunk #1 succeeded at 273 (offset 5 lines).
Hunk #2 FAILED at 280.
1 out of 2 hunks FAILED -- saving rejects to file includes/diff/DifferenceEngine.php.rej
patching file includes/specials/SpecialMovepage.php
Hunk #1 succeeded at 118 (offset 2 lines).
patching file includes/specials/SpecialRecentchangeslinked.php
Hunk #1 succeeded at 59 (offset -17 lines).
patching file includes/specials/SpecialWhatlinkshere.php
Hunk #1 succeeded at 86 with fuzz 2.
patching file skins/CologneBlue.php
Hunk #1 FAILED at 302.
1 out of 1 hunk FAILED -- saving rejects to file skins/CologneBlue.php.rej
patching file skins/Modern.php
Hunk #1 FAILED at 65.
1 out of 1 hunk FAILED -- saving rejects to file skins/Modern.php.rej
patching file skins/MonoBook.php
Hunk #1 FAILED at 84.
1 out of 1 hunk FAILED -- saving rejects to file skins/MonoBook.php.rej
patching file skins/Vector.php
Hunk #1 FAILED at 165.
1 out of 1 hunk FAILED -- saving rejects to file skins/Vector.php.rej
patching file skins/common/shared.css
Hunk #1 FAILED at 820.
1 out of 1 hunk FAILED -- saving rejects to file skins/common/shared.css.rej

Attached:

It seems firstHeading is marked as in page language now. Is this bug fixed?

See also bug 63880.

(In reply to Liangent from comment #21)

It seems firstHeading is marked as in page language now. Is this bug fixed?
See also bug 63880.

As far as the issues I outlined in comment 6 are concerned:

  • The issue of the solution of Bug 32826 not working on hi.wp seems to be fixed (since the title's lang attribute now seems to be specified as the content language)
  • However, the issue with the language not being set properly at oldWS remains. It seems the entire page content has the lang attribute set as en.

Example: [1] should have lang set as hi, but it's en. Not sure if this can be fixed by using DISPLAYTITLE, and wrapping the entire page in a div with the proper lang attribute set.
Anyways, this problem seems to be out of scope for this bug, and in scope of bug 35430 .

[1]: https://wikisource.org/wiki/%E0%A4%B9%E0%A4%A8%E0%A5%81%E0%A4%AE%E0%A4%BE%E0%A4%A8%E0%A4%9A%E0%A4%BE%E0%A4%B2%E0%A5%80%E0%A4%B8%E0%A4%BE

Ebraminio moved this task from Backlog to MediaWiki-core on the RTL board.Aug 9 2015, 12:04 PM
Amire80 moved this task from Untriaged to RTL on the I18n board.Feb 28 2018, 12:34 PM

Change 419405 had a related patch set uploaded (by Fomafix; owner: Fomafix):
[mediawiki/core@master] [WIP] Wrap title with an element with class, lang and dir attributes

https://gerrit.wikimedia.org/r/419405

Huji added a subscriber: Huji.Mar 14 2018, 2:34 PM

@Fomafix can you please post screenshots of before and after your patch, so we have it clearly documented what was the problem and how we are fixing it?

This change can solve bidi problems like T159267: Preferences in RTL: Username with parentheses shown wrong in some WebKit based browsers. Using right language codes also helps screen readers.

https://fa.wikipedia.org/wiki/3GP_%D9%88_3G2?action=edit&veswitched=1&uselang=de has currently:

<!DOCTYPE html>
<html lang="de" dir="ltr">
<head>
<meta charset="UTF-8"/>
<title>Bearbeiten von „3GP و 3G2“ – ویکی‌پدیا</title>
<body>
<h1 lang="fa">Bearbeiten von „3GP و 3G2“</h1>
</body>
</html>

Expected content:

<!DOCTYPE html>
<html lang="de" dir="ltr">
<head>
<meta charset="UTF-8"/>
<title>Bearbeiten von „&#x2067;3GP و 3G2&#x2069;“ – ویکی‌پدیا</title>
<body>
<h1>Bearbeiten von „<bdi dir="rtl" lang="fa">3GP و 3G2</bdi></h1>
</body>
</html>

This would also solve the wrong bidi formatting in the HTML title.

The browser and especially the operating systems have a bad support for the Explicit Directional Isolate Formatting Characters (LRI, RLI, FSI, PDI) introduced in Unicode 6.3:

  • Windows 10 does not support the Explicit Directional Isolate Formatting Characters and shows replacement characters for the Directional Formatting Characters.
  • Firefox and Chrome support the Explicit Directional Isolate Formatting Characters. In tooltips on Windows 10 replacement characters are shown in Chrome.
  • Edge ignores Explicit Directional Isolate Formatting Characters.
  • Internet Explorer ignores Explicit Directional Isolate Formatting Characters and shows replacement characters.
  • Linux (Ubuntu 16.10) ignores Explicit Directional Isolate Formatting Characters and hide them.

Showing replacement characters is not acceptable. The bad behavior in Windows 10 prevents that Explicit Directional Isolate Formatting Characters can used in the HTML title as default behavior.