Page MenuHomePhabricator

Ordered lists per default receive `decimal` list style type, which causes issues in non-arabic numeral scripts
Closed, ResolvedPublic

Description

Ordered lists in MinveraNeue have decimal applied as list style type, since a patch addressing T42751 6 years ago.
This seems Arabic number carrying scripts, ahem Western, centric.

I first noticed this in T150377#5000005, example given of Farsi there

image.png (566×642 px, 85 KB)

vs inline reference numbers:

image.png (178×1 px, 50 KB)

Current code of shared.css

Used by Vector and Monobook

/* Localised ordered list numbering for some languages */
ol:lang( azb ) li,
ol:lang( bcc ) li,
ol:lang( bgn ) li,
ol:lang( bqi ) li,
ol:lang( fa ) li,
ol:lang( glk ) li,
ol:lang( kk-arab ) li,
ol:lang( lrc ) li,
ol:lang( luz ) li,
ol:lang( mzn ) li {
	list-style-type: -moz-persian;
	list-style-type: persian;
}

ol:lang( ckb ) li,
ol:lang( sdh ) li {
	list-style-type: -moz-arabic-indic;
	list-style-type: arabic-indic;
}

ol:lang( hi ) li,
ol:lang( mai ) li,
ol:lang( mr ) li,
ol:lang( ne ) li {
	list-style-type: -moz-devanagari;
	list-style-type: devanagari;
}

ol:lang( as ) li,
ol:lang( bn ) li {
	list-style-type: -moz-bengali;
	list-style-type: bengali;
}

ol:lang( or ) li {
	list-style-type: -moz-oriya;
	list-style-type: oriya;
}

QA steps

Beta cluster:

Production

Developer notes

  • Create a mediawiki.language.styles module in core
  • Move i18n styles from mediawiki.legacy.shared to the new mediawiki.language.styles module

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

@Volker_E I received your message off-task to review your observation about in-line citations and the list of citations. Your observation is indeed correct that the mismatch you observed should be addressed.

MW core's shared.css should be considered to be used here as well, as it carries a lot of the internationalization logic.

Change 494400 had a related patch set uploaded (by VolkerE; owner: VolkerE):
[mediawiki/skins/MinervaNeue@master] Remove ol overrides to ensure list styles in non-Arabic number scripts

https://gerrit.wikimedia.org/r/494400

Volker_E renamed this task from Ordered lists per default receive `decimal` list style type, which causes issues in non-arabic number scripts to Ordered lists per default receive `decimal` list style type, which causes issues in non-arabic numeral scripts.Mar 6 2019, 12:17 AM

Can somebody clarify the open design questions here?

Do we agree that we should treat non-Arabic numeral ordered lists equal to Arabic numeral (all Latin scripts to my knowledge) ones?
If we don't set decimal as default, ordered lists fall back to decimal. Which we should override with better treatment for serving our communities (we do this in other skins already).

Example when our list styling is remove and HTML and browser default:

image.png (850×1 px, 246 KB)

Do we agree that we should treat non-Arabic numeral ordered lists equal to Arabic numeral (all Latin scripts to my knowledge) ones?
If we don't set decimal as default, ordered lists fall back to decimal. Which we should override with better treatment for serving our communities (we do this in other skins already).

I apologize for my lack of general understanding here. Can you further clarify what it means to "treat non-Arabic numeral ordered lists equal to Arabic numeral (all Latin scripts to my knowledge) ones"?

Jdlrobson subscribed.

Volker looks like we still need to understand this problem. Maybe the three of us can sync?

Ok, trying it async: The way MinervaNeue is (and has done so falsely for about 6+ years) treating ordered list bullets (in this very case numbers) is wrong to all but any scripts who expect “1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11…” as numerals.
“1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11…” are called Arabic numerals using the Hindu-Arabic numeral system. Hence the name it is used in English, German and other Latin scripts.

The example I've given in the task description shows the current treatment going wrong in the Perso-Arabic ordered list as part of the Eastern-Arabic numeral system (hehe, hope to confuse you further here), written as “۱, ۲, ۳, ۴, ۵, ۶, ۷, ۸, ۹…”

It's wrong and hurts our users there, and there's a simple cure, in different flavors

  • shared.css takes care about that. MinervaNeue could use it
  • MinervaNeue only takes the ordered lists parts from there, resulting in less code, but more maintainance and code split.

@Volker_E thanks for the clarification. So we'd like to use numerals native to the respective language whenever possible. That sounds like a great idea 🙂

Change 494400 had a related patch set uploaded (by Bartosz Dziewoński; owner: VolkerE):
[mediawiki/skins/MinervaNeue@master] Remove ol overrides to ensure list styles in non-Arabic number scripts

https://gerrit.wikimedia.org/r/494400

Change 494400 merged by jenkins-bot:
[mediawiki/skins/MinervaNeue@master] Remove ol overrides to ensure list styles in non-Arabic number scripts

https://gerrit.wikimedia.org/r/494400

matmarex subscribed.

(This patch removes the incorrect overrides from MinervaNeue, but something still needs to be done to load the correct overrides from shared.css – currently they are not loaded when using MinervaNeue.)

This comment was removed by Jdlrobson.

Change 539167 had a related patch set uploaded (by Jdlrobson; owner: Jdlrobson):
[mediawiki/core@master] WIP: Introduce mediawiki.language.styles for all skins

https://gerrit.wikimedia.org/r/539167

Change 541659 had a related patch set uploaded (by Jdlrobson; owner: Jdlrobson):
[mediawiki/core@master] WIP: Split language styles out mediawiki.legacy.shared into mediawiki.languages.styles

https://gerrit.wikimedia.org/r/541659

Jdlrobson added a subscriber: matmarex.

Would be good to talk about this at frontends standards group.

The mediawiki.legacy.shared module is 2.9kb after gzip. Given Minerva only ships 7.7kb of CSS, that's a significant chunk.

Options include

  1. rename mediawiki.legacy and including it in Minerva. Move anything inside it that doesn't make sense for Minerva into a skinStyle so it can be removed.

Downside: there's a lot of cruft here and CSS that would be unused.

  1. Load mediawiki.legacy.shared via JS.

Downside: no non-js experience and potential flash of unstyled content.

  1. Extract mediawiki.language.styles as is

The language specific CSS we need to solve this issue currently weighs in at 1kb but most of it isn't needed on most page views.
Demonstrated in https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/541659
If we took this approach, could be maybe use better selectors to achieve the same goal?

  1. Add a lightweight mediawiki.language.styles module that uses i18n CSS pseudo messages

This clocks in at only 584 bytes after gzip (rather than 1kb)
Demonstrated in https://gerrit.wikimedia.org/r/539167
As @matmarex points out this however doesn't work on pages which mix content.
For those pages could we use JS (given they are the exception rather than the rule)? Any other neat ideas?

I am wondering if such pages could load the same ResourceLoader module but with a different language. Does RL even support that? Could it?

  1. Is not an option performance wise, as long as we're not massively cleaning up mediawiki.legacy and even then it wouldn't address your questions in 3. and 4.
  2. That makes non-English, non-Latin speakers 2nd hand citizens, I don't think that's acceptable.
  3. The selectors are a pain, on the other hand it's doubtful how much less selectors would really save us here. Gzip is pretty good at taking similar selectors.

One shortcoming of current CSS language selection is, that there is no grouping of certain scripts. Not fully clear, if linguists would even be fine with such grouping, but it would address the technical concern much better, than the painful listing of each and every script.

  1. You've left out my concern, that we leave this very technical terminology (CSS values) to translator community/put the burden for a technical shortcoming on them. I don't think that's a worthwile approach, given how far off less popular languages are and that less translators often results in less technical experts as well, that would put the resulting output at risk mostly not to be translated or to be wrongly translated.

Talked about this in frontend standards group today.
We agreed that we should create a mediawiki.language.styles module and pull out the i18n styles from mediawiki.language.styles to there.

We want to understand the edge cases of mixed content, but it seems like the most essential thing to fix right now is to fix the current content language.
One possible solution we could explore to fix the immediate problem of Minerva is to do something along the lines of https://gerrit.wikimedia.org/r/539167 but add a skinStyle containing all the i18n rules (which Minerva would blank).

Once we've worked out how to make this CSS leaner, Minerva will stop blanking it.

I will sync with Volker to suggest a new approach.

While https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/539167 Introduce minimial mediawiki.language.styles for all skins is not perfect, I think it's better than the current situation and at least provides us with the i18n specific style module that we agree is needed. I need to understand the situations where content language/interface language don't match, but my understanding that is logged in users only so should not penalise anonymous users.

@Krinkle have we considered passing content language in ResourceLoader requests? I feel if we did this, this solution could be improved further - rendering two CSS rules.

@Jdlrobson Can you clarify what problem this is meant to solve? The example link works as expected in Vector, by having 5 lines of CSS instead of 1. Not a huge cost, but I guess the intent is to reduce this somehow. The proposed core involves `1 line of CSS instead of 5, however it doesn't solve the same problem.

If memory serves, it was done this way for Vector not because we couldn't optimise it further, but because it was required to solve the problem. In particular, that this should not be tied to user interface but to content. If we only need to support the site-wide content language, that can already be achieved by providing the site-language as LESS variable and then using LESS code (e.g. if/else) to decide which line to output. It can also be done in PHP if we prefer but I'm assuming you'd prefer it not be in PHP if possible. The issue is that this assumes a site has 1 language for all content, which is at odds with that MW supports as core feature since about 2014, which is that the content language may vary by page. It is no longer a site-wide concept. There's two three main ways in which this surfaces. 1) Special pages can be localised to the user-language but wiki pages by site-language. Both of these will be in different languages, and both can contain eg. numbered lists. Which means we can't have 1 style served by the module apply to both. 2) Translated pages within a wiki, such as on Commons and mediawiki.org. 3) Nested content within a page using <div dir=… lang=…>. All these are supported currently in Vector through the generic :lang() syntax native to CSS.

@Jdlrobson Can you clarify what problem this is meant to solve? The example link works as expected in Vector, by having 5 lines of CSS instead of 1. Not a huge cost, but I guess the intent is to reduce this somehow. The proposed core involves `1 line of CSS instead of 5, however it doesn't solve the same problem.

Not sure what you mean by 5 lines of CSS rather than 1. I explained this a little in T217616#5557895. Where the content language matches the interface language (the issue we are most interested in solving for Minerva), the patch ships 1 line of CSS (584 bytes) rather than 2.9kb of CSS (mediawiki.legacy.shared) or 33 lines of CSS (extracting the relevant rules verbatim inside mediawiki.legacy.shared). Once the line-height rules are understood better, the plan would be to follow a similar approach for those (https://github.com/wikimedia/mediawiki/blob/master/resources/src/mediawiki.legacy/shared.css#L528-L578). In total these rules clock in at 1kb currently.

If memory serves, it was done this way for Vector not because we couldn't optimise it further, but because it was required to solve the problem. In particular, that this should not be tied to user interface but to content. If we only need to
support the site-wide content language, that can already be achieved by providing the site-language as LESS variable and then using LESS code (e.g. if/else) to decide which line to output.

I believe that's essentially what I'm doing here? https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/539167/

It can also be done in PHP if we prefer but I'm assuming you'd prefer it not be in PHP if possible. The issue is that this assumes a site has 1 language for all content, which is at odds with that MW supports as core feature since about 2014, which is that the content language may vary by page. It is no longer a site-wide concept. There's two three main ways in which this surfaces. 1) Special pages can be localised to the user-language but wiki pages by site-language. Both of these will be in different languages, and both can contain eg. numbered lists. Which means we can't have 1 style served by the module apply to both. 2) Translated pages within a wiki, such as on Commons and mediawiki.org. 3) Nested content within a page using <div dir=… lang=…>. All these are supported currently in Vector through the generic :lang() syntax native to CSS.

I'm not following this bit. My question is as follows:

Right now we pass the interface language to a ResourceLoaderModule like so:
https://en.wikipedia.org/w/load.php?lang=en-gb&modules=site.styles&only=styles&skin=vector

However if I am viewing he.wikipedia.org, the content language is (usually) Hebrew.
What I'm asking is if it's possible to send this information to ResourceLoader too

e.g. https://en.wikipedia.org/w/load.php?lang=en-gb&contentlang=hemodules=site.styles&only=styles&skin=vector

@Jdlrobson The site's main content language is already known to ResourceLoader server-side. It doesn't need to be passed by url. For cases where we truly only need one export per wiki/site, regardless of the current user or page content, for the site's main content language, this can be used. And is used. For example, to export site overrides like "MediaWiki:License-dropdown", "MediaWiki:Common.js" or "MediaWiki:Citoid-template-type-map.json".

But for styling of content that is language-dependent, we need to support :lang() where the content defines the language. This is because we have multi-lingual content pages and special pages that present content in the user's interface language.

Any proposed optimisation to this would need to account for that as well. What strategy do you propose for that?

Alternatively, if these requirements are something we don't want to have, perhaps we can figure out what they are for and whether core could still satisfy our users without these features.

And if we account for it by having both and using only one, we need to determine how far that will spread (how much code would need awareness of this optimisation), how much complexity and maintenance cost that in turn will add, and what the performance cost is of doing that (e.g. more modules to conditionally load, means less shared caching between page views and more startup cost).

The aforementioned example compresses down to 0.3 K (source). This could perhaps be optimised as well with better selectors.

But for styling of content that is language-dependent, we need to support :lang() where the content defines the language. This is because we have multi-lingual content pages and special pages that present content in the user's interface language.

Sure. My guess however is this is rare and I would hope under 5% (maybe 1%) of pages. For these, I suggest loading the additional 1kb rules in the same way we load jquery.collapsible.styles. Right now I'm most focused on fixing the larger group of pages where it's not a problem (right now communities are working around this with a FOUC provided by MediaWiki:Mobile.css which is not great).

The aforementioned example compresses down to 0.3 K (source). This could perhaps be optimised as well with better selectors.

This doesn't seem to include the list-style type selectors in resources/src/mediawiki.legacy/shared.css?
My 1kb was including all the other potential i18n rules e.g. that mention @noflip that I've not become acquainted with yet. If I focus just on the list styles I see a bump 0.5kb after gzip when I add all these rules to Minerva - which is considerable IMO to style one list.

Any proposed optimisation to this would need to account for that as well. What strategy do you propose for that?

An alternative approach would be to reduce our selector rules to one selector by grouping languages and pairing them with a class. e.g.

ol:lang( azb ) li,
ol:lang( bcc ) li,
ol:lang( bgn ) li,
ol:lang( bqi ) li,
ol:lang( fa ) li,
ol:lang( glk ) li,
ol:lang( kk-arab ) li,
ol:lang( lrc ) li,
ol:lang( luz ) li,
ol:lang( mzn ) li {
	list-style-type: persian;
}

becomes

.list-group-1 li {
	list-style-type: persian;
}

I don't know how well this will work in practice but would lead to 4 gzip friendly rules.

Change 549220 had a related patch set uploaded (by Jdlrobson; owner: Jdlrobson):
[mediawiki/core@master] Add i18n feature to ResourceLoaderSkinModule

https://gerrit.wikimedia.org/r/549220

Change 541659 abandoned by Jdlrobson:
WIP: Split language styles out mediawiki.legacy.shared into mediawiki.languages.styles

Reason:
Continued in https://gerrit.wikimedia.org/r/549220

https://gerrit.wikimedia.org/r/541659

Change 549220 merged by jenkins-bot:
[mediawiki/core@master] Add 'legacy' and 'i18n' features to ResourceLoaderSkinModule

https://gerrit.wikimedia.org/r/549220

Minerva can now make use of i18n-ordered-lists to fix this bug with a 0.4kb increase in CSS payload.

Change 564760 had a related patch set uploaded (by Jdlrobson; owner: Jdlrobson):
[mediawiki/skins/MinervaNeue@master] Ordered lists per default receive correct numerals (finally)

https://gerrit.wikimedia.org/r/564760

Change 564760 merged by jenkins-bot:
[mediawiki/skins/MinervaNeue@master] Ordered lists per default receive correct numerals (finally)

https://gerrit.wikimedia.org/r/564760

@Ladsgroup any chance you could help me get rid of the rule in https://fa.m.wikipedia.org/wiki/%D9%85%D8%AF%DB%8C%D8%A7%D9%88%DB%8C%DA%A9%DB%8C:Mobile.css

the following is no longer needed:

#content ol {
  list-style-type:-moz-persian !important; /* To override some codes that avoids this one to be used on Mozilla Firefox */
  list-style-type:persian;
}

Change 539167 abandoned by Jdlrobson:
Add reduced size i18n styling features to ResourceLoaderSkinModule

Reason:
I think there are some good ideas in here worth pursuing but enough time has passed that I don't have the energy to continue to pursue them.

https://gerrit.wikimedia.org/r/539167