Page MenuHomePhabricator

The <bdi> tag is sanitized
Closed, ResolvedPublic

Description

HTML5 has several new features designed to help web designers create pages that show RTL text properly. Among them is the new <bdi> element. It stands for "bi-directional isolation", and it basically means that the text inside it doesn't mix up with adjacent punctuation marks and numbers, which have ambiguous directionality properties.

When it will be widely supported in browsers, this element will become very useful in Wikimedia projects. Probably the most common use case for it is showing the left-to-right name in an article about a person in a right-to-left Wikipedia; this name is usually written in the opening paragraph and adjacent to the birth date, which consists of numbers that get mixed up because of the bidi algorithm. Currently it is solved by using Unicode control characters (RLM), <bdi> is a more elegant solution.

This tag is currently sanitized, so it is impossible to use it in MediaWiki pages. It shouldn't be sanitized. To do this, i suppose that it should be added to htmlpairsStatic in includes/Sanitizer.php, but i'm not so familiar with that class, so maybe something else is needed.

When will it be widely supported in browsers? It was just added to Mozilla, and note that even the source file name is similar:
https://hg.mozilla.org/mozilla-central/rev/6f03f6a821c0

AFAIK, it is also about to be added to Chrome.


Version: unspecified
Severity: enhancement

Details

Reference
bz31817

Related Objects

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 11:56 PM
bzimport added a project: MediaWiki-Parser.
bzimport set Reference to bz31817.

Damn, that sounds like it'll actually be useful. Wish we had it in HTML 4! ;)

http://dev.w3.org/html5/spec/text-level-semantics.html#the-bdi-element

Once we pass it through, is there any CSS we could do to simulate it on existing browsers that wouldn't interfere (like unicode-bidi: embed? or would that also require explicitly knowing what direction to add), or would it just have to be a progressive enhancement?

I'm not sure that there's a proper way to simulate it, except maybe trying to play with RLMs. I'm going to try to build a template that does something like it, but i can't promise anything.

Created attachment 9262
A demo of possible use cases for the <bdi> tag.

The attachment gives a couple of use cases for the <bdi> tag. Both can use the RLM character instead, but <bdi> is more elegant.

You can test it in the Mozilla Nightly version that was published today. Fresh!

Attached:

Untagging milestone 1.19. This has not been resolved, and trunk is closed to new features for 1.19.

Firefox 10 has this feature released. https://developer.mozilla.org/en/Firefox_10_for_developers

"The new HTML5 <bdi> element, bi-directional isolation, allowing isolation of parts of text with a different directionality has been implemented."

(In reply to comment #1)

Damn, that sounds like it'll actually be useful. Wish we had it in HTML 4! ;)

http://dev.w3.org/html5/spec/text-level-semantics.html#the-bdi-element

Once we pass it through, is there any CSS we could do to simulate it on
existing browsers that wouldn't interfere (like unicode-bidi: embed? or would
that also require explicitly knowing what direction to add), or would it just
have to be a progressive enhancement?

There's no perfect way to emulate it, but this shouldn't be the reason for not enabling <bdi> now at least to start testing it in some less-essential environments where failures on older browsers won't be too harmful.

Reviewed, added parser test case, added to attribute whitelist to ensure that 'lang' attribute works, and merged.

https://gerrit.wikimedia.org/r/gitweb?p=mediawiki/core.git;a=commit;h=fe0b0270fdb242c1aed6b39c1be4782792f323a5

The fix appears to be merged and deployed. It works as I expect on my local test wiki. But if I try it on the English Wikipedia, the <bdi> tag disappears completely.

See an example here:
https://en.wikipedia.org/wiki/User:Amire80/bdi

Before the fix the tag would appear escaped, so you would just see the <bdi>text</bdi> in the rendered page. Now it seems to disappear. When I look at the source of the rendered page, I see:

The Hebrew name of Haifa is חיפה.<sup id="cite_ref-0" class="reference"><a href="#cite_note-0"><span>[</span>1<span>]</span></a></sup>

On my local wiki I see: The Hebrew name of Haifa is חיפה.<bdi><sup id="cite_ref-0" class="reference"><a href="#cite_note-0">[1]</a></sup></bdi>

I thought maybe it wasn't working on en.wikipedia.org due to wgHtml5=false there. But tested on mediawiki.org (where wgHtml5=true) and there it doesn't work either:

https://www.mediawiki.org/w/index.php?title=Project:Sandbox&oldid=537524

<bdi> is tripped.

Maybe caused by W3 Tidy ?

It works on the foundation wiki
(i used http://wikimediafoundation.org/wiki/Special:ExpandTemplates, since i'm no editor there)

which has Tidy disabled, so could be Tidy indeed.

We need to upgrade tidy to a version that knows about the latests html5 tricks.

In the mean time, we can adapt the config of Tidy by adding "new-inline-tags: bdi" to it.

https://gerrit.wikimedia.org/r/17174

Added the bdi element to the Tidy configuration that ships with MediaWiki. No idea if that is actually the one we use on WMF sites however. There seems to be no tidy config in our puppet repo at least.

patch moved to gerrit. removing patch-need-review key word

I confirm that Derk-Jan's patch does the right thing.

Note that even though iOS/MaoOS does not render the "bdi" element properly, it has support for the "isolate" mode since mid-2012.
This support is in the native iOS/MacOS text renderer and accessible with the "-webkit-isolate" value in CSS.
Unfortunately, the "bdi" element (even if it's parsed correctly by Chrome and Safari in the HTML DOM) still has no style associated in the default stylesheet of the user agent.
So we need to set this for iOS/MacOS (works since Safari 6)

bdi, output { unicode-bidi: -webkit-isolate; }

iOS and MacOS however define this style since always:

bdo { unicode-bidi: isolate-override; }

But this "bdo" should no longer be used (it is not suitable except for static text where we know the direction of text not only inside the content, but also before and after and does not work properly for generated items.

For example to display the translatable message "%@ won the game with the highest score!", where "%@" is a placeholder for a player name which could contain Latin or Arabic and where this message itself may be translated to Arabic where the same placeholder will be replaced by a user name in Latin or Arabic: you CANNOT get the a correct layout order (correct placement of the user name and of the final "!") without using isolates.

Apple event made a public presentation video in 2016 (still visible by anyone today) explaining that isolates, supported in the native text renderer of the OS (by using Unicode Bidi controls, with help of the UBA version supporting isolates which was integrated in the Unicode 6.3 standard of mid 2012), and in Safari (via CSS styles) had become the default in the devkit for iOS/MacOS for building UIs with seemless support of bidi. For example the UI API has added the support for localization of messages containing "%@" placeholders (instead of "%s") which are convenient because they surround the label replacing the placeholder by isolate Bidi controls <FSI> (U+2068 FIRST STRONG ISOLATE) and <PDI>(U+2069 POP DIRECTIONAL ISOLATE), to make a seemless use the UBA 6.3 version integrated in iOS/MacOS/watchOS since 2012.

So the only minor problem with iOS is that Apple forgot to include a default stylesheet for "bdi" elements (may be only to allow old HTML documents to use another default Bidi mode, when these documents included also Bidi overrides inside "bdi" elements, including LRM and RLM controls, whose effect should still extend outside the "bdi" elements as if they were "bdo" elements or "span elements").

So: forget RLM and LRM in all Mediawiki translations, and forget the "bdo" element: use "bdi" instead and provide its support by a small single line stylesheet for iOS/MacOS (and we'll no longer hear people using iPhones compaining about the use of "bdi" elements, and replacing them by non-working "bdo"!)

As well LRM or RLM controls should no longer be used in Mediawiki (they are also Bidi overrides, but they are really not needed in Mediawiki if we use "bdi" for isolates).

Older Safari versions (5.1 or before), only for iPhones 6 or before sold up to mid-2012, represented less than 0.01% in 2018 (allmost all of them for iPhone 6). Older iPhones 5 or before have almost completely disappeared (they are out of service for most of them or no longer support most apps, or don't have enough memory). So we can safely ignore them.

Please add support for "bdi" in all iPhones, just add the missing CSS property that Apple forgot to include in the default user-agent stylesheet (Apple insists that ALL browsers for iOS or macOS MUST use the native Safari engine, otherwise Apple refuses to accept these browsers in the App Store and Apple will block these browsers).

And so, stop using "bdo" in MediaWiki (even if it's still supported as it was part of the HTML4 standard and used only the UBA version that was part of Unicode before Unicode 6.3 published in september 2012).

The only remaining browsers still not having support for isolates is Internet Explorer (up to IE 11) whose last version was also in 2013, before being replaced by Edge (and now Edge is updated to use the Webkit engine which supports isolates since long. Here also the market share for IE is now very low (only for users of Windows XP or Windows 7, that also no longer have any support from Microsoft except a few critical security updates): these versions of Windows are only used in specific enterprise environments with costly paid support, and those browsers are used only for specific applications that still depend on it but should no longer be used to navigate any wiki, these users should use other PC with supported versions of windows, or should install another browser (Firefox, Chrome...) if needed (the best reason being for their own security, if they don't want to break their corporate apps still depending on the old IE11's renderer) if they dont have another PC with a modern OS.

IE11 was still about 2% of the global market (in 2018, now it's even lower), including all corporate apps depending on it for their old documents (but certainly not with Mediawiki). IE11 should no longer be supported in MediaWiki (its support only for Windows XP and windows 7 has ended for all users, except in corporate environments for their proprietary apps and documents still depending on IE11; and IE11 is no longer installed by default in Windows 10 but only by administative request and can no longer be installed without a specific support licence paid to Microsoft)

As well, Edge in Windows 10 (which had no support for isolates with its previous engine) is now using Webkit (Edge was updated in all supported public versions of Windows 10) and so now it fully supports "bdi" and isolates.