Page MenuHomePhabricator

Do not capitalise first letter of Wikidata descriptions on languages that do not support capitalisation e.g. Arabic
Closed, ResolvedPublic

Description

Hello,

Since few months I noticed that there's a separation between the first letter and the word in Arabic Wikidata descriptions on Mobile version only (As the descriptions appear automatically on Mobile version).

Hope the pictures will describe the problem better.

Three examples:
Should be صفحة (one word without any separation) (https://ar.wikipedia.org/wiki/%D9%85%D8%B1%D9%83%D8%A8_(%D8%AA%D9%88%D8%B6%D9%8A%D8%AD)

Screenshot_2018-12-05-14-20-36.png (998×720 px, 72 KB)

Should be منطقة (https://ar.wikipedia.org/wiki/%D8%A7%D9%84%D9%85%D8%BA%D8%B1%D8%A8_%D8%A7%D9%84%D8%B9%D8%B1%D8%A8%D9%8A)
Screenshot_2018-12-05-14-20-46.png (1×720 px, 168 KB)

Should be ثمرة (https://ar.m.wikipedia.org/wiki/%D8%AA%D9%81%D8%A7%D8%AD)
Screenshot_2018-12-05-14-21-30.png (1×720 px, 554 KB)

This problem appear in all browsers on Wikipedia mobile version

Developer notes

A CSS rule .tagline:first-letter exists in Minerva that capitalises the first letter
It's likely that this is the rule that is causing the problem

Personally I would like us to drop this rule and rely on Wikidata descriptions verbatim (requires product buy-in).

Note: We do not show Wikidata descriptions on English Wikipedia (including beta).

Event Timeline

ovasileva moved this task from Incoming to Needs Prioritization on the Web-Team-Backlog board.

Hey @alanajjar as a non-Arabic speaker I'm struggling to understand the problem here.
When I look at the ar description on https://www.wikidata.org/w/index.php?title=Q89&diff=prev&oldid=794153191

Screen Shot 2018-12-12 at 5.00.17 PM.png (572×973 px, 103 KB)

to my eye it matches what I see on
Screen Shot 2018-12-12 at 5.02.38 PM.png (533×693 px, 133 KB)

Is there any more information you can give me to understand this problem better? I can't see anything in the CSS that might be impacting this. Could this possible be an issue with the font rendering at that size?

If you add the following to your User:<name>/minerva.css does the problem go away?

.heading-holder .tagline {
    font-size: 1em;
}

cc @Volker_E who has a better eye for these kind of problems.

Jdlrobson lowered the priority of this task from High to Medium.Dec 13 2018, 1:05 AM

@Jdlrobson in Arabic Language our words are formed of connected letters, the problem is that the 1st letter of the ar description appears separated. in the case of the Apple article description, the 1st word (on the right) should appear as: ثمرة where the letter ث is connected to the rest of the word, yet it doesn't appear so, and it seems to be a single letter at first then comes the rest of the word.

The same problem exist in fawiki, but when opened fawiki article it'll appear separated (Photo 1) then after 1 second it'll be one word (Photo 2). But in arwiki still separated.

fawiki article https://fa.wikipedia.org/wiki/%D8%B3%DB%8C%D8%A8

Screenshot_2018-12-17-04-22-22.png (935×720 px, 344 KB)

Screenshot_2018-12-17-04-22-14.png (909×717 px, 327 KB)

I hope if @Huji and @Ladsgroup can explain better than me :D

let me explain in more depth here, In Arabic scripts letters can be joined or separated. So one letter e.g. "ی" (Yaa) can take several different forms, depending on whether there is a letter before or after it (see this for four different forms of Yaa).
The problem here is the first letter should have the form that like there is another letter after it, but it shows like it's alone. an extra ZWNJ character might be at fault here OR a tag is opened and closed for the first letter (or a different css rule) causing it to look it's alone.
@Jdlrobson HTH

I can confirm that this occurs with fawiki mobile version on an iPhone. It also happens when I go to https://fa.m.wikipedia.org/wiki/%D8%B3%DB%8C%D8%A8 using Firefox on Mac or Chrome on Windows (so it is not an OS issue, or a mobile versus non-mobile device issue).

Even better, I managed to create a short video of this. Here, we are at the page linked above, and I hit F5 to refresh the page. You will see on the top-right of the page that the string "میوه سردسیری" momentarily changes to look like "م‌یوه سردسیری" with a different font, then to "م‌یوه سردسیری" with the correct font, and then back to the correct string.

Looking at the source of the page, the first letter of the string is not wrapped in any HTML elements or anything like that. This makes me think that the issue has to do with the font somehow not rendering the write glyphs for a short time.

@Jdlrobson

Jdlrobson edited projects, added MinervaNeue; removed MobileFrontend, Reading Epics (Wikidata Description Editing).

What is this MinervaNeue tag? I get a 404 when I go to it? Also, I think it should be kept in MobileFrontend as it actually only occurs on the mobile frontend.

What is this MinervaNeue tag?

The skin that was previously packaged up inside MobileFrontend.

I get a 404 when I go to it?

Please file a separate bug report in Phabricator against Phabricator.

Thanks for explaining the bug!

So I can't replicate this but I do see a rule relating to .tagline::first-letter which doesn't seem to work on my browser but is in the stylesheet.

https://ar.m.wikipedia.org/wiki/%D8%AE%D8%A7%D8%B5:%D8%AA%D8%A7%D8%B1%D9%8A%D8%AE/%D9%85%D9%8A%D8%AF%D9%8A%D8%A7%D9%88%D9%8A%D9%83%D9%8A:Mobile.css might fix this.
Can somebody check if the above change has fixed the problem?

May I derail this slightly. We're discussing to change the body font stack on mobile to OS defaults. This should enhance readability on mobile devices and also improve internationalization as the operating system font defaults are optimized for a wider range of localization in general. This is beta-tested for now and planned to be rolled-out begin of next year. Corresponding task is T175877
This current task seems to be one which might be affected for the positive by this change. You can try out by adding

.skin-minerva {
	font-family; -apple-system, ".SFNSText-Regular", "San Francisco", "Roboto", "Segoe UI", "Helvetica Neue", "Lucida Grande", sans-serif;
}

to your common.css

Update
This does seem to improve font-size balance, but not touch the issue. Example with changed font stack

Arabic arwiki MinervaNeue Chrome OS X font-stack (730×1 px, 151 KB)
. Please still consider testing the stack for font-rendering and readability improvements and feel free to provide feedback at T175877!

Jdlrobson renamed this task from Wrong separation between the first letter and the word in Arabic Wikidata descriptions on Mobile version to Do not capitalise first letter of Wikidata descriptions on languages that do not support capitalisation e.g. Arabic.Dec 18 2018, 12:14 AM
Jdlrobson removed a project: Wikidata.
Jdlrobson updated the task description. (Show Details)

@alexhollender @ovasileva while Volker is right, it seems we should just drop the rule that capitalises the first letters of Wikidata descriptions. From what I understand the capitalisation of Wikidata descriptions was targeted as a workaround for English Wikipedia and since we don't show descriptions there any more I'd love to remove this. I should add a disclaimer that I've always felt capitalisation of wikidata descriptions via CSS was the wrong solution here as it hides a real problem in the data and the ability for users to fix it.

+1 to Jon's comment above. General CSS uppercasing without limiting scope to languages often results in unintended i18n issues.
@Jdlrobson how would such change affect other Latin script languages?

Thanks for explaining the bug!

So I can't replicate this but I do see a rule relating to .tagline::first-letter which doesn't seem to work on my browser but is in the stylesheet.

https://ar.m.wikipedia.org/wiki/%D8%AE%D8%A7%D8%B5:%D8%AA%D8%A7%D8%B1%D9%8A%D8%AE/%D9%85%D9%8A%D8%AF%D9%8A%D8%A7%D9%88%D9%8A%D9%83%D9%8A:Mobile.css might fix this.
Can somebody check if the above change has fixed the problem?

No it did not fix the problem. Now the pages permanently show the incorrect form. For instance on https://ar.m.wikipedia.org/wiki/%D9%88%D8%A7%D8%AD%D8%AF you see a string that looks like "ع‌دد طبيعي" instead of "عدد طبيعي".

In contrast, the following CSS code makes the problem go away:

.tagline:first-letter {
    text-transform: none !important;
}

Which basically means that Jon said and Volker endorsed above is the right approach.

  • do we believe that the first letter of the Wikidata descriptions should always be capitalized?
  • is the current best practice on Wikidata to capitalize the first letter of descriptions? If not (i.e. we can't count on them slowly being updated), what other method do you think might work to achieve the capitalization @Jdlrobson? From Wikidata's perspective I could imagine that storing the description without any capitalization is preferred

No it did not fix the problem. Now the pages permanently show the incorrect form.

@Huji I updated the rule. Is it fixed now?

In T211198#4831983, @alexhollender wrote:
  • do we believe that the first letter of the Wikidata descriptions should always be capitalized?
  • is the current best practice on Wikidata to capitalize the first letter of descriptions?

Generally in English they are capitalized and its the exception where they are not. The CSS is there for the edge case but clearly with a technical cost.

If not (i.e. we can't count on them slowly being updated), what other method do you think might work to achieve the capitalization @Jdlrobson? From Wikidata's perspective I could imagine that storing the description without any capitalization is preferred

Many languages don't have any concept of capitalization. Regardless this is a data problem - seeing a non-capitalised wikidata description and being able to fix it is a great micro-contribution.

In T211198#4831983, @alexhollender wrote:
  • do we believe that the first letter of the Wikidata descriptions should always be capitalized?

I do not think this is about 'beliefs'. :) "this looks like a case where user input should not be messed with" plus many other opinions in T131013 already... Duplicate?

Edit: T208139: Georgian words are automatically (incorrectly) capitalized when entered - another duplicate?

No it did not fix the problem. Now the pages permanently show the incorrect form.

@Huji I updated the rule. Is it fixed now?

No. Now the same issue that was originally reported occurs again (the first letter is detached for a fraction of a second, similar to the video I sent).

I think this is because it takes a fraction of a second for Mobile.css to load; so first this file from Minerva skin causes the issue, and then Mobile.css is loaded and corrects the issue.

So if we really want to fix the originally reported task, the only way to do it is to either completely remove the text-transform: capitalize; rule, or find a way to restrict it to a few languages like English and German and the like

So if we really want to fix the originally reported task, the only way to do it is to either completely remove the text-transform: capitalize; rule, or find a way to restrict it to a few languages like English and German and the like

That is exactly what I had in mind with my question above… :)

+1 to Jon's comment above. General CSS uppercasing without limiting score to languages often results in unintended i18n issues.

This.

Per @ASammour request, I made this replacement, and know this problem fixed, and became like fawiki issue

Even better, I managed to create a short video of this. Here, we are at the page linked above, and I hit F5 to refresh the page. You will see on the top-right of the page that the string "میوه سردسیری" momentarily changes to look like "م‌یوه سردسیری" with a different font, then to "م‌یوه سردسیری" with the correct font, and then back to the correct string.