Page MenuHomePhabricator

Interwiki links for some languages incorrectly capitalize the first letter of the endonym
Open, Needs TriagePublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

What happens?: The link is rendered as "Toki pona"

What should have happened instead?: The link should render as "toki pona" (fully uncapitalized).

This is because https://gerrit.wikimedia.org/g/mediawiki/core/+/67d057590125e69ac4fdb3f76f4e64768604d5d7/includes/Skin/Skin.php#1248 unconditionally applies ucfirst to interlanguage link titles. Old discussion in T39705 (bugzilla ticket 37705)

It's ultimately because of languages like French which spell their language name in lowercase but expect it to be capitalized in particular contexts like a list of languages. That would suggest the problem really is that the capitalization is done with the current user locale's capitalization rules, and we need to change that to use the native capitalization rules. But is it really correct for toki pona to redefine ucfirst as a noop? Would that break the world?

Event Timeline

Aklapper renamed this task from Interwiki links for Toki Pona capitalize the first letter of the endonym rendering it as "Toki pona" to Interwiki links for some languages incorrectly capitalize the first letter of the endonym.Nov 28 2025, 2:29 PM

Also affects e.g. isiZulu.

I'm not quite sure about this case. zuwiki main page contains the capitalized sentence "IsiZulu singulimi lwamaZulu." We should ask the community.

Pppery added a subscriber: BscottAPL33.

Just commenting to say I agree this is not the correct behavior. There's a semantic question here: Is this a list of endonyms rendered in their title-case form, or is this a list of loanwords rendered in the interface language' title-case form? Most of the way the language list works suggests the former. For instance, otherwise we would not be using non-Latin-script names in English, in which most style guides would call for using a romanization. Further, if the goal were to just follow the norms of the interface language, it would make more sense to use exonyms entirely. To me it seems like the goal is that each entry should represent how that language's name would appear in title-case in that language. In the case of Toki Pona, that is "toki pona".

Ideally this would be built into getLanguage(), but given that it only applies to a few languages (exact number unclear per above) and seems to only come up in this one context, special-casing might be easier?

Should French be displayed as français instead? As a speaker of a language whose script lacks capitalization system, I'd say seeing text in lists in sentence case feels weird to me.

While the French word for the French language is "français", that word is still capitalized in title case. So we would only want to change to a lowercase "français" if the idea were to shift away from title-casing in general. Personally I'd actually prefer the look of things with all sentence-case endonyms, but that's a bigger discussion, and at least last time this was discussed (see description) there was consensus for the title-case approach.