Page MenuHomePhabricator

Improve Latin-Cyrillic converter in Serbo-Crotian Wikipedia
Open, Needs TriagePublicFeature

Description

Feature summary (what you would like to be able to do and where):
Now Latin-Cyrillic converter in Serbo-Croatian Wikipedia cannot deal with uncommon spelling (like George Washington) correctly. Maybe it can be solved in a proper way.

Use case(s) (list the steps that you performed to discover that problem, and describe the actual underlying problem which you want to solve. Do not describe only a solution):
The present version of Latin-Cyrillic converter in Serbo-Croatian Wikipedia cannot deal with uncommon spelling (mainly in Croatia). If something like the converter of localized words in Chinese Wikipedia is used, we can add the Serbian Latin spelling as 'localized word for Serbia' and only transliterate the Serbian spelling into Cyrillic letters. If possible, we can implement the converter of localized words in Serbo-Croatian Wikipedia and merge the Serbo-Croatian Wikipedia with the Serbian, Croatian and Bosnian ones into a single Serbo-Croatian one (so that we can also prevent users of the Montenegrin language from requesting Montenegrin Wikipedia again and again).

Benefits (why should this be implemented?):
As above.

Event Timeline

Aklapper renamed this task from Merge Serbian, Croatian, Bosnian Wikipedia or improve Latin-Cyrillic converter in Serbo-Crotian Wikipedia to Improve Latin-Cyrillic converter in Serbo-Crotian Wikipedia.Jan 10 2023, 9:28 PM
Aklapper removed a project: MW-1.40-notes.
Aklapper removed a subscriber: MediaWiki-Language-converter.

As per my knowledge, Serbia, Bosnia & Herzegovina and Montenegro needs both Cyrillic and Latin scripts and Croatia needs only the Latin script. Hence, we need: Serbia-Latin, Serbia-Cyrillic (default for Serbia), B&H-Latin (default for B&H except Republika Srpska of B&H), B&H-Cyrillic (default for Republika Srpska of B&H), Montenegro-Latin (default for Montenegro), Montenegro-Cyrillic, Croatia-Latin (default for Croatia), and actually the proper default for Montenegro is not very certain as per my knowledge. Then, we may use something like localization of region-specific words in Chinese Wikipedia to deal with differences better.

John_Smith_Ri triaged this task as Unbreak Now! priority.Feb 17 2023, 4:34 AM
Func lowered the priority of this task from Unbreak Now! to Needs Triage.Feb 17 2023, 12:32 PM
Func subscribed.

That's not how task triaging works.

Winston_Sung raised the priority of this task from High to Needs Triage.Feb 26 2023, 4:48 AM
Winston_Sung subscribed.

Please don't change the priority to High or Unbreak Now! again.

That's not how task triaging works.

Maybe this can be done with templates, e.g. '''Džordž Vašington''' ({{lang-en|George Washington}})

wich renders in cyrillic as: Џорџ Вашингтон (енглески: George Washington).

Perhaps a template with two parameters would be helpful, e.g. {{foreign|George Washington|Džordž Vašington}} to aid the conversion process.

Some people above might misunderstand something. Here, for example, "George Washington" is just Croatian spelling (the Croatian variant does not require spelling of loanwords to be adapted to Croatian orthography), while "Džordž Vašington" is Serbian spelling.

@John_Smith_Ri :

So you would like to have

  • sh-Latn-HR : George Washington
  • sh-Latn-RS : Džordž Vašington
  • sh-Cyrl-? : Џорџ Вашингтон

@Winston_Sung That is exactly what I found in Serbian and Croatian versions of Wikipedia. BTW, although Bosnian version uses "George Washington", I heard that Bosnian speakers also use Cyrillic alphabet, so I am not sure about the case in Bosnian (and Montenegrin). If an improved converter is applied, Cyrillic alphabets can be directly used in Serbo-Croatian Wikipedia, etc. In addition, the Serbo-Crotian Wikipedia is called "Wikipedija" at present, but the Cyrillic spelling is actually "Vikipedija" (there is no Cyrillic letter corresponding to "W"). Maybe this problem can be solved in recent time.

Hi @John_Smith_Ri,

Thank you for your interest in contributing to this language space.

I do want to let you know that, although the language converter still has some small kinks that need to be ironed out, the consensus seems to be that there is currently nothing substantial preventing a merge. The communities are not yet ready/fond of the idea and so this is where we currently need the most support towards this goal.

Do let me know if you have any questions and feel free to reach out directly if there is something that I can help with.

Kind regards,
Denis

The current converter works letter by letter (and 3 two-letter combinations).

An improved version needs to work on a word or phrase level.

It requires the implementation of the rules as described in Orthographic transcription.

This will be probably include maintaining a long list of phrases, some of which could be combined with redirects.

This needs to cover all grammatical cases (e.g. Johna F. Kennedy[j]a).

Templates who allow the editor to aid the transcription are required where the source language is not clear (homographs), or if there is an exception from the rule.

We are currently working on this on-wiki by creating the infrastructure for word conversion. This will include using appropriate templates, such as {{translit}} (for Serbo-Croatian words), {{translitN}} (for titles), and {{foreign}} (for foreign words).