Page MenuHomePhabricator

The <bdi> tag is sanitized
Closed, ResolvedPublic

Description

HTML5 has several new features designed to help web designers create pages that show RTL text properly. Among them is the new <bdi> element. It stands for "bi-directional isolation", and it basically means that the text inside it doesn't mix up with adjacent punctuation marks and numbers, which have ambiguous directionality properties.

When it will be widely supported in browsers, this element will become very useful in Wikimedia projects. Probably the most common use case for it is showing the left-to-right name in an article about a person in a right-to-left Wikipedia; this name is usually written in the opening paragraph and adjacent to the birth date, which consists of numbers that get mixed up because of the bidi algorithm. Currently it is solved by using Unicode control characters (RLM), <bdi> is a more elegant solution.

This tag is currently sanitized, so it is impossible to use it in MediaWiki pages. It shouldn't be sanitized. To do this, i suppose that it should be added to htmlpairsStatic in includes/Sanitizer.php, but i'm not so familiar with that class, so maybe something else is needed.

When will it be widely supported in browsers? It was just added to Mozilla, and note that even the source file name is similar:
https://hg.mozilla.org/mozilla-central/rev/6f03f6a821c0

AFAIK, it is also about to be added to Chrome.


Version: unspecified
Severity: enhancement

Details

Reference
bz31817

Event Timeline

bzimport raised the priority of this task from to Normal.Nov 21 2014, 11:56 PM
bzimport added a project: MediaWiki-Parser.
bzimport set Reference to bz31817.
Amire80 created this task.Oct 19 2011, 10:28 AM
brion added a comment.Oct 19 2011, 6:14 PM

Damn, that sounds like it'll actually be useful. Wish we had it in HTML 4! ;)

http://dev.w3.org/html5/spec/text-level-semantics.html#the-bdi-element

Once we pass it through, is there any CSS we could do to simulate it on existing browsers that wouldn't interfere (like unicode-bidi: embed? or would that also require explicitly knowing what direction to add), or would it just have to be a progressive enhancement?

I'm not sure that there's a proper way to simulate it, except maybe trying to play with RLMs. I'm going to try to build a template that does something like it, but i can't promise anything.

Created attachment 9262
A demo of possible use cases for the <bdi> tag.

The attachment gives a couple of use cases for the <bdi> tag. Both can use the RLM character instead, but <bdi> is more elegant.

You can test it in the Mozilla Nightly version that was published today. Fresh!

Attached:

Untagging milestone 1.19. This has not been resolved, and trunk is closed to new features for 1.19.

Firefox 10 has this feature released. https://developer.mozilla.org/en/Firefox_10_for_developers

"The new HTML5 <bdi> element, bi-directional isolation, allowing isolation of parts of text with a different directionality has been implemented."

(In reply to comment #1)

Damn, that sounds like it'll actually be useful. Wish we had it in HTML 4! ;)
http://dev.w3.org/html5/spec/text-level-semantics.html#the-bdi-element
Once we pass it through, is there any CSS we could do to simulate it on
existing browsers that wouldn't interfere (like unicode-bidi: embed? or would
that also require explicitly knowing what direction to add), or would it just
have to be a progressive enhancement?

There's no perfect way to emulate it, but this shouldn't be the reason for not enabling <bdi> now at least to start testing it in some less-essential environments where failures on older browsers won't be too harmful.

brion added a comment.Apr 4 2012, 8:22 PM

Reviewed, added parser test case, added to attribute whitelist to ensure that 'lang' attribute works, and merged.

https://gerrit.wikimedia.org/r/gitweb?p=mediawiki/core.git;a=commit;h=fe0b0270fdb242c1aed6b39c1be4782792f323a5

The fix appears to be merged and deployed. It works as I expect on my local test wiki. But if I try it on the English Wikipedia, the <bdi> tag disappears completely.

See an example here:
https://en.wikipedia.org/wiki/User:Amire80/bdi

Before the fix the tag would appear escaped, so you would just see the <bdi>text</bdi> in the rendered page. Now it seems to disappear. When I look at the source of the rendered page, I see:

The Hebrew name of Haifa is חיפה.<sup id="cite_ref-0" class="reference"><a href="#cite_note-0"><span>[</span>1<span>]</span></a></sup>

On my local wiki I see: The Hebrew name of Haifa is חיפה.<bdi><sup id="cite_ref-0" class="reference"><a href="#cite_note-0">[1]</a></sup></bdi>

I thought maybe it wasn't working on en.wikipedia.org due to wgHtml5=false there. But tested on mediawiki.org (where wgHtml5=true) and there it doesn't work either:

https://www.mediawiki.org/w/index.php?title=Project:Sandbox&oldid=537524

<bdi> is tripped.

Maybe caused by W3 Tidy ?

TheDJ added a comment.Aug 1 2012, 10:30 AM

It works on the foundation wiki
(i used http://wikimediafoundation.org/wiki/Special:ExpandTemplates, since i'm no editor there)

which has Tidy disabled, so could be Tidy indeed.

TheDJ added a comment.Aug 1 2012, 10:36 AM

We need to upgrade tidy to a version that knows about the latests html5 tricks.

In the mean time, we can adapt the config of Tidy by adding "new-inline-tags: bdi" to it.

TheDJ added a comment.Aug 1 2012, 10:58 AM

https://gerrit.wikimedia.org/r/17174

Added the bdi element to the Tidy configuration that ships with MediaWiki. No idea if that is actually the one we use on WMF sites however. There seems to be no tidy config in our puppet repo at least.

patch moved to gerrit. removing patch-need-review key word

TheDJ added a comment.Aug 23 2012, 9:03 PM

patch merged.

I confirm that Derk-Jan's patch does the right thing.

Works on Wikipedia.

Restricted Application added a project: I18n. · View Herald TranscriptJun 2 2015, 2:20 PM