Page MenuHomePhabricator

Reduce relying on hidden direction marks in MediaWiki
Open, LowPublic

Assigned To
None
Authored By
Ebrahim
Sep 29 2024, 12:45 PM
Referenced Files
F57616318: image.png
Oct 15 2024, 1:26 PM
F57616314: image.png
Oct 15 2024, 1:26 PM
F57613943: image.png
Oct 14 2024, 9:19 PM
F57613941: image.png
Oct 14 2024, 9:19 PM
F57569152: image.png
Sep 29 2024, 12:45 PM
F57569141: image.png
Sep 29 2024, 12:45 PM

Description

From https://www.w3.org/International/questions/qa-bidi-controls.en.html

Unicode control codes are not useful for bidi formatting when working with structural or paragraph-level markup.

For inline content we recommend that, wherever possible, you use markup in HTML and XML, rather than the Unicode control characters.

One practical reason for it is those control characters will end up in user text copy paste of typical users who are unaware and clueless and it breaks string matching logic. e.g. T27277

And bidi control includes many different characters (LRI, RLI, FSI, LRE, RLE, LRO, RLO, PDI, PDF) which all should be avoided but one main place generates them in MediaWiki is

We should reduce uses of them and eventually if there aren't any plain text uses of the methods, make them deprecated and remove them from MediaWiki or at least adding to their documentation that most of the time they should be avoided.

To help developers not familiar with bidi, what is the propose of such things in MediaWiki at the first place anyway? Consider the following, that $title can be a username or a page title with

abc $title (123)

If $title = 'title'; this will be turned into,

abc title (123)

but if $title = '1شسی'; the very same code results in

abc 1شسی (123)‍

Note where 123 has went, between parts of the title string! It's still logically after the $title but it's displayed before it because bidi algorithm is mislead and to solve this on plain text one can put a LRM which results in,

abc ۱شسی‎ (123)

Which has the hidden character but if one copies the title it will contain the character which is undesirable so a better solution can be use of <bdi> which results in,

abc <bdi>1شسی</bdi> (123)

image.png (47×124 px, 2 KB)

which is good but the placement of that 1 is changed so the better solution would be to add dir="ltr" (the LTR or RTL here should match site or user language direction depending on the context, for page title of mono language wikis, better to wiki's content direction, for usernames however not user of enforcing any direction and letting default direction to be used can be better):

abc <bdi dir="ltr">1شسی</bdi> (123)

image.png (47×124 px, 2 KB)

For example in the use https://gerrit.wikimedia.org/g/mediawiki/extensions/ProofreadPage/+/c30b1e384ab694161fa10665636eb3f2f4c4349a/includes/Special/SpecialProofreadPages.php#357 just wrapping $plink with <bdi> should be enough.

Generally to understand and review these changes keep this rule of thumb in mind that whenever a user generated content such as username, page title, summary and content is used in one line with text and messages that are part of software UI of MediaWiki, those user generated content should be wrapped with <bdi>. It's like XSS and SQL Injection but for bidi and the antidote is an appropiate wrapping with HTML <bdi> tag. (but this is just a simplification with leaving some details out)

Also related suggestions from W3C to use HTML tag and attribute over CSS styles from https://drafts.csswg.org/css-writing-modes-3/

Because HTML UAs can turn off CSS styling, we recommend HTML authors to use the HTML dir attribute and <bdo> element to ensure correct bidirectional layout in the absence of a style sheet. Authors should not use direction in HTML documents.

Because HTML UAs can turn off CSS styling, we recommend HTML authors to use the HTML dir attribute, <bdo> element, and appropriate distinction of text-level vs. grouping-level HTML element types to ensure correct bidirectional layout in the absence of a style sheet. Authors should not use unicode-bidi in HTML documents.

Details

SubjectRepoBranchLines +/-
mediawiki/coremaster+27 -29
mediawiki/coremaster+15 -16
mediawiki/corewmf/1.43.0-wmf.26+25 -29
mediawiki/coremaster+29 -25
mediawiki/coremaster+15 -8
mediawiki/coremaster+5 -5
mediawiki/coremaster+8 -0
mediawiki/extensions/GrowthExperimentsmaster+3 -4
mediawiki/coremaster+5 -6
mediawiki/coremaster+5 -4
mediawiki/extensions/CategoryTreemaster+4 -4
mediawiki/coremaster+2 -2
mediawiki/coremaster+32 -38
mediawiki/coremaster+3 -0
mediawiki/coremaster+4 -1
mediawiki/coremaster+2 -2
mediawiki/coremaster+5 -5
mediawiki/coremaster+7 -2
mediawiki/coremaster+3 -6
mediawiki/coremaster+3 -4
mediawiki/coremaster+1 -5
mediawiki/coremaster+4 -2
mediawiki/coremaster+13 -9
mediawiki/coremaster+2 -2
mediawiki/coremaster+10 -0
mediawiki/extensions/Translatemaster+3 -3
mediawiki/coremaster+4 -8
mediawiki/extensions/ProofreadPagemaster+1 -2
Show related patches Customize query in gerrit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change #1076836 had a related patch set uploaded (by Ebrahim; author: Ebrahim):

[mediawiki/core@master] Add a note on use of getDirMark and getDirMarkEntity

https://gerrit.wikimedia.org/r/1076836

Change #1076836 merged by jenkins-bot:

[mediawiki/core@master] Add a note on use of getDirMark and getDirMarkEntity

https://gerrit.wikimedia.org/r/1076836

Change #1076443 merged by jenkins-bot:

[mediawiki/core@master] Replace a use of bidi control code with HTML markup

https://gerrit.wikimedia.org/r/1076443

Change #1077067 had a related patch set uploaded (by Ebrahim; author: Ebrahim):

[mediawiki/core@master] Mark now unused getDirMarkEntity as deprecated

https://gerrit.wikimedia.org/r/1077067

Change #1077072 had a related patch set uploaded (by Ebrahim; author: Ebrahim):

[mediawiki/core@master] Replace use of getDirMark with <bdi> tag

https://gerrit.wikimedia.org/r/1077072

Change #1077078 had a related patch set uploaded (by Ebrahim; author: Ebrahim):

[mediawiki/core@master] Replace use of getDirMark with <bdi> in Special:DoubleRedirects

https://gerrit.wikimedia.org/r/1077078

Change #1077078 merged by jenkins-bot:

[mediawiki/core@master] Replace use of getDirMark with <bdi> in Special:DoubleRedirects

https://gerrit.wikimedia.org/r/1077078

Change #1077072 merged by jenkins-bot:

[mediawiki/core@master] Replace use of getDirMark with <bdi> tag in ProtectedPagesPager

https://gerrit.wikimedia.org/r/1077072

Change #1077445 had a related patch set uploaded (by Ebrahim; author: Ebrahim):

[mediawiki/core@master] Use a better bidi aware in CommentParser

https://gerrit.wikimedia.org/r/1077445

Change #1077705 had a related patch set uploaded (by Ebrahim; author: Ebrahim):

[mediawiki/core@master] Use HTML markup instead of bidi control chars in Special:NewPages

https://gerrit.wikimedia.org/r/1077705

I wrote the following at https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1077705 and am writing again here,

Generally to understand and review these changes keep this rule of thumb in mind that whenever a user generated content such as username, page title, summary and content is used in one line with text and messages that are part of software UI of MediaWiki, those user generated content should be wrapped with <bdi>. It's like XSS and SQL Injection but for bidi and the antidote is an appropiate wrapping with HTML <bdi> tag. (but this is just a simplification with leaving some details out)

Change #1077718 had a related patch set uploaded (by Ebrahim; author: Ebrahim):

[mediawiki/core@master] Use HTML markup instead of bidi control chars in Special:ShortPages

https://gerrit.wikimedia.org/r/1077718

Change #1077739 had a related patch set uploaded (by Ebrahim; author: Ebrahim):

[mediawiki/core@master] Use HTML markup instead of bidi control chars in action=history

https://gerrit.wikimedia.org/r/1077739

Change #1077739 merged by jenkins-bot:

[mediawiki/core@master] Use HTML markup instead of bidi control chars in action=history

https://gerrit.wikimedia.org/r/1077739

Change #1077718 merged by jenkins-bot:

[mediawiki/core@master] Use HTML markup instead of bidi control chars in Special:ShortPages

https://gerrit.wikimedia.org/r/1077718

Change #1077705 merged by jenkins-bot:

[mediawiki/core@master] Use HTML markup instead of bidi control chars in Special:NewPages

https://gerrit.wikimedia.org/r/1077705

Change #1077067 merged by jenkins-bot:

[mediawiki/core@master] Mark now unused getDirMarkEntity as deprecated

https://gerrit.wikimedia.org/r/1077067

Change #1077771 had a related patch set uploaded (by Ebrahim; author: Ebrahim):

[mediawiki/core@master] Use HTML markup instead of bidi control chars in Special:WhatLinksHere

https://gerrit.wikimedia.org/r/1077771

Change #1077775 had a related patch set uploaded (by Ebrahim; author: Ebrahim):

[mediawiki/core@master] Use HTML markup instead of bidi control chars in Special:ListRedirects

https://gerrit.wikimedia.org/r/1077775

Change #1077778 had a related patch set uploaded (by Ebrahim; author: Ebrahim):

[mediawiki/core@master] Use HTML markup instead of bidi control chars in revision delete

https://gerrit.wikimedia.org/r/1077778

Change #1077790 had a related patch set uploaded (by Ebrahim; author: Ebrahim):

[mediawiki/core@master] Use HTML markup instead of bidi control chars in wiki changes

https://gerrit.wikimedia.org/r/1077790

Change #1077796 had a related patch set uploaded (by Ebrahim; author: Ebrahim):

[mediawiki/core@master] Mark getDirMark as deprecated

https://gerrit.wikimedia.org/r/1077796

Change #1077771 merged by jenkins-bot:

[mediawiki/core@master] Use HTML markup instead of bidi control chars in Special:WhatLinksHere

https://gerrit.wikimedia.org/r/1077771

Change #1077775 merged by jenkins-bot:

[mediawiki/core@master] Use HTML markup instead of bidi control chars in Special:ListRedirects

https://gerrit.wikimedia.org/r/1077775

Change #1077778 merged by jenkins-bot:

[mediawiki/core@master] Use HTML markup instead of bidi control chars in revision delete

https://gerrit.wikimedia.org/r/1077778

Change #1077790 merged by jenkins-bot:

[mediawiki/core@master] Use HTML markup instead of bidi control chars in wiki changes

https://gerrit.wikimedia.org/r/1077790

Change #1077796 merged by jenkins-bot:

[mediawiki/core@master] Mark getDirMark as deprecated

https://gerrit.wikimedia.org/r/1077796

Change #1077445 merged by jenkins-bot:

[mediawiki/core@master] Use a better bidi aware markup in CommentParser

https://gerrit.wikimedia.org/r/1077445

Change #1077945 had a related patch set uploaded (by Ebrahim; author: Ebrahim):

[mediawiki/core@master] Remove getDirMark the same way as RevDelLogItem

https://gerrit.wikimedia.org/r/1077945

Change #1077945 merged by jenkins-bot:

[mediawiki/core@master] Remove getDirMark the same way as RevDelLogItem

https://gerrit.wikimedia.org/r/1077945

Change #1077995 had a related patch set uploaded (by Ebrahim; author: Ebrahim):

[mediawiki/core@master] Wrap category link around <bdi>

https://gerrit.wikimedia.org/r/1077995

Change #1077997 had a related patch set uploaded (by Ebrahim; author: Ebrahim):

[mediawiki/extensions/CategoryTree@master] Replace use of getDirMark with HTML markup

https://gerrit.wikimedia.org/r/1077997

Change #1077995 merged by jenkins-bot:

[mediawiki/core@master] Wrap category links around <bdi>

https://gerrit.wikimedia.org/r/1077995

Change #1077997 merged by jenkins-bot:

[mediawiki/extensions/CategoryTree@master] Replace use of getDirMark with HTML markup

https://gerrit.wikimedia.org/r/1077997

Change #1078045 had a related patch set uploaded (by Ebrahim; author: Ebrahim):

[mediawiki/core@master] Use <bdi> in Language::specialList

https://gerrit.wikimedia.org/r/1078045

Change #1078099 had a related patch set uploaded (by Ebrahim; author: Ebrahim):

[mediawiki/core@master] Deprecate embedBidi the same way as getDirMark

https://gerrit.wikimedia.org/r/1078099

Change #1078100 had a related patch set uploaded (by Ebrahim; author: Ebrahim):

[mediawiki/extensions/GrowthExperiments@master] Replace embedBidi with <bdi> HTML tag

https://gerrit.wikimedia.org/r/1078100

Change #1078101 had a related patch set uploaded (by Ebrahim; author: Ebrahim):

[mediawiki/core@master] Aggregate unicode directional formatting characters

https://gerrit.wikimedia.org/r/1078101

Change #1078045 merged by jenkins-bot:

[mediawiki/core@master] Use <bdi> in Language::specialList

https://gerrit.wikimedia.org/r/1078045

Change #1078099 merged by jenkins-bot:

[mediawiki/core@master] Deprecate embedBidi the same way as getDirMark

https://gerrit.wikimedia.org/r/1078099

Change #1078100 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] Replace embedBidi with <bdi> HTML tag

https://gerrit.wikimedia.org/r/1078100

Change #1078348 had a related patch set uploaded (by Ebrahim; author: Ebrahim):

[mediawiki/core@master] Use content dir in Special:DoubleRedirects

https://gerrit.wikimedia.org/r/1078348

Change #1078101 merged by jenkins-bot:

[mediawiki/core@master] Move definition of all bidi control characters to one place

https://gerrit.wikimedia.org/r/1078101

Change #1078348 merged by jenkins-bot:

[mediawiki/core@master] Use content dir in DoubleRedirects and ShortPages special pages

https://gerrit.wikimedia.org/r/1078348

Hey all this looks like it caused T375975

Could this be a wrong number?

@IKhitron - I think it's T376814: Regression: mobile watchlist's text overlaps which is a significant visual regression on the mobile watchlist page

Change #1079218 had a related patch set uploaded (by Hashar; author: Hashar):

[mediawiki/core@wmf/1.43.0-wmf.26] Revert "Use HTML markup instead of bidi control chars in wiki changes"

https://gerrit.wikimedia.org/r/1079218

Change #1079218 merged by Aklapper:

[mediawiki/core@wmf/1.43.0-wmf.26] Revert "Use HTML markup instead of bidi control chars in wiki changes"

https://gerrit.wikimedia.org/r/1079218

Mentioned in SAL (#wikimedia-operations) [2024-10-10T08:55:39Z] <aklapper@deploy2002> Started scap sync-world: Backport for [[gerrit:1079218|Revert "Use HTML markup instead of bidi control chars in wiki changes" (T375975 T376814)]]

Mentioned in SAL (#wikimedia-operations) [2024-10-10T08:57:52Z] <aklapper@deploy2002> hashar, aklapper: Backport for [[gerrit:1079218|Revert "Use HTML markup instead of bidi control chars in wiki changes" (T375975 T376814)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Mentioned in SAL (#wikimedia-operations) [2024-10-10T09:07:49Z] <aklapper@deploy2002> Finished scap sync-world: Backport for [[gerrit:1079218|Revert "Use HTML markup instead of bidi control chars in wiki changes" (T375975 T376814)]] (duration: 12m 09s)

Change #1079146 had a related patch set uploaded (by Jdlrobson; author: Ebrahim):

[mediawiki/core@master] Bring ChangesList user link out of <bdi>

https://gerrit.wikimedia.org/r/1079146

Can we reconsider making a whole bunch of very potentially user-facing changes (high risk) in the late days of 1.43 (LTS!) development that have clearly already realized the risk of causing those user-facing changes and needed reversion, while attempting to fix an issue which has existed in the software for a very long time (low reward)?

Adding <bdi> tags to edit section summaries seems to have affected how focus outlines are now rendered (before/after) in Firefox:

image.png (68×305 px, 6 KB)

<span class="autocomment"><a href="/wiki/Википедия:Кандидаты_в_избранные_списки_и_порталы#Доска_почёта" title="Википедия:Кандидаты в избранные списки и порталы">→<bdi dir="ltr">Доска почёта</bdi></a>: </span>
image.png (68×189 px, 3 KB)

Adding <bdi> tags to edit section summaries seems to have affected how focus outlines are now rendered (before/after) in Firefox:

image.png (68×305 px, 6 KB)

<span class="autocomment"><a href="/wiki/Википедия:Кандидаты_в_избранные_списки_и_порталы#Доска_почёта" title="Википедия:Кандидаты в избранные списки и порталы">→<bdi dir="ltr">Доска почёта</bdi></a>: </span>
image.png (68×189 px, 3 KB)

This comes from the same root as https://bugzilla.mozilla.org/show_bug.cgi?id=337274 known bug of the specific browser, visible for years for other use cases and an inevitable issue given bidi promises this part of the code had before these changes but at least now there isn't a hidden character in other browsers without this issue that end up in user clipboard and making linking to specific section break on some users workflows. e.g T27277 among other bidi improvements this change had on that part of the code.

Can we reconsider making a whole bunch of very potentially user-facing changes (high risk) in the late days of 1.43 (LTS!) development that have clearly already realized the risk of causing those user-facing changes and needed reversion, while attempting to fix an issue which has existed in the software for a very long time (low reward)?

Review or reverting them is ok. It can even be done as a part of stabilization process. We didn't know beforehand there are implicit dependencies mobile frontend has made on the details some specifically were made for power users according to the comment there, those issues are fixed now and those dependencies now have made a bit more explicit code wise at least.

What I don’t understand in that instance with .autocomment is why isn’t part of the tag contents, not why the browser bug happens.

What I don’t understand in that instance with .autocomment is why isn’t part of the tag contents, not why the browser bug happens.

is a part of UI not the content that we try to "isolate" from the UI, the practical reason is doing that turns

image.png (134×292 px, 24 KB)

to

image.png (156×458 px, 32 KB)

if the arrow is moved inside the tag in a wiki with different language for UI and content (but not right now because that <bdi> uses user lang for <bdi> and not content language which is going to be changed) so the place <bdi> should be put should be specifically around the content not the UI. This is the idea why we had what is called pollution of directional characters in T27277 and interestingly specifically in CommentParser there was a comment requesting someone with bidi knowledge improve the situation https://github.com/wikimedia/mediawiki/blob/c7cd5399e73945d418559d54faac3a0a3cacf1b4/includes/CommentFormatter/CommentParser.php#L209-L213 and that was what I did also, replacing pollution with dir mark with the correct markup in the right place, to my understanding. It's still not perfect, having different assumptions and requirements we can end up with a different markup, this is just what I assume is just better than what was before, hidden characters here and there.

Fix uploaded as soon as I saw your message here @Izno, my mistake.

Ebrahim renamed this task from Remove uses of $lang->getDirMark and $lang->getDirMarkEntity in non plain text output to Reduce relying on hidden direction mark in MediaWiki.Nov 2 2024, 7:40 PM
Ebrahim renamed this task from Reduce relying on hidden direction mark in MediaWiki to Reduce relying on hidden direction marks in MediaWiki.

Change #1086069 had a related patch set uploaded (by Ebrahim; author: Ebrahim):

[mediawiki/core@master] Remove invisible direction marks from Persian time formats

https://gerrit.wikimedia.org/r/1086069

Change #1086069 merged by jenkins-bot:

[mediawiki/core@master] Remove invisible direction marks from Persian time formats

https://gerrit.wikimedia.org/r/1086069

Change #1091809 had a related patch set uploaded (by Ebrahim; author: Ebrahim):

[mediawiki/core@master] Remove RLM from ckb and kk-arab date formats

https://gerrit.wikimedia.org/r/1091809

Change #1091809 merged by jenkins-bot:

[mediawiki/core@master] Remove RLM from ckb and kk-arab date formats

https://gerrit.wikimedia.org/r/1091809