Page MenuHomePhabricator

LanguageConverter for Javanese (from jv-Latn to jv-Java)
Open, MediumPublic

Description

Currently Javanese Wikipedia only uses Latin (jv-Latn) script. We wished to be able to use Javanese (jv-Java according to http://en.wikipedia.org/wiki/ISO_15924) script also.

Work has been started to make a Webfont and Narayam (now seem to be merged to UniversalLanguageSelector) in jv-Java.

Does the variants need to be enabled first? Where?


Version: 1.21.x
Severity: enhancement
See Also:
T41381: Add Javanese font to WebFonts

Details

Reference
bz45779

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 1:22 AM
bzimport set Reference to bz45779.
bzimport added a subscriber: Unknown Object (MLST).
Bennylin created this task.Mar 6 2013, 3:05 PM
Liuxinyu970226 set Security to None.Sep 7 2015, 7:30 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 7 2015, 7:30 AM
Liuxinyu970226 removed a subscriber: wikibugs-l-list.
Amire80 moved this task from Untriaged to Script conversion on the I18n board.Feb 4 2018, 10:47 AM
TJones added subscribers: cscott, TJones.EditedJun 14 2018, 9:57 PM

@Bennylin contacted me on Meta about getting this ticket going after reading my blog post from earlier this year. It makes sense to have the discussion here, though. He wrote:

I'd like continue the request phab:T47779 going. Can you give me pointers how to start moving this forward?

I think @Amire80 and @cscott are the best people to talk to, but here's my opinion.

  • If you want to encourage someone else to work on this, it would help to point them to your transliterator (which I found when looking for a model to see how complex Javanese transliteration would be), which includes MIT-licensed Javascript transliteration code hosted on jvwiki. Having existing code that addresses the transliteration is a huge help.
  • If you want to be more involved, you could port your Javascript code to PHP and host it on GitHub or similar, with a generally friendly license (MIT or Apache, probably) or compatible license (GPL 2.0, probably), which would make it easier for someone else to work with it in LanguageConverter.
  • If you want to be even more involved, you could look at the existing Language Converter code for various languages and see if any of them seem similar enough to Javanese that you could architect your new PHP code similarly.
  • If you want to do it all yourself—which I did over the course of a year with crh/Crimean Tatar (see T23582, etc.)—you probably need to set up your own Vagrant instance, get that configured, then figure out how to create your own LanguageConverter. Frankly, it's daunting. My first patch for crh covers a lot of what you have to get working. Depending on the language settings, you may also need to update wmf-config/InitialiseSettings.php. There's probably more to it than that, too, but that's what jumps out at me right now. (I had to get help several times when I got stuck on what to do next.)

For any of the first three options, I might be interested in working on this, especially if you had working PHP code with a friendly license (but I can't make any promises right now).

Before you start on the last option (or try to talk me into working on it), I'd suggest asking @cscott for advice on how to proceed. He's been working on using a finite state transducer (FST) formalism (see T191925) for LanguageConverters, and I don't really understand how existing LanguageConverters are converted to FSTs, or whether new ones should be developed as FSTs, or really how any of that works. It's also not even clear whether developing a new LanguageConverter should even consider the FST situation, or just work in PHP and leave the FSTs to @cscott. Hopefully @cscott or maybe @Amire80 can give some guidance here.

Thanks Trey. I'm interested in the last option, but I'd like to hear others chimed in first.

Hi @kamholz, one of Sundanese Wikipedia community ask me to help this ticket for Javanese script (and Sundanese script as well). I asked your help as your same experience when adding Balinese script which already deployed last year. Let me know if you need help from the wikipedia community. At least, you can give some advice here what must they prepared for Transliteration/LanguageConverter. Thanks in advance.

kamholz added a comment.EditedMon, Feb 1, 8:36 AM

Hi @kamholz, one of Sundanese Wikipedia community ask me to help this ticket for Javanese script (and Sundanese script as well). I asked your help as your same experience when adding Balinese script which already deployed last year. Let me know if you need help from the wikipedia community. At least, you can give some advice here what must they prepared for Transliteration/LanguageConverter. Thanks in advance.

Cool, I had no idea there was interest in this. My recent Balinese LanguageConverter work will hopefully be helpful, and not just because Balinese script is similar to Javanese and Sundanese script. One major improvement is that you can write the transliteration rules using an ICU rule-based transliterator. You still need to write some PHP but not as much. The Balinese LanguageConverter code is here.

In order to make the development easier I've just made a live rule-based transliterator tester. You can paste in a rule set, test it on some text, and see what happens. I would suggest looking at one of the Balinese-to-Latin rule sets as a starting point, for example this one. It's not currently written to handle Latin-to-Balinese. It may be possible to adapt it to work in both directions for Javanese/Sundanese, or you can write a separate rule set for that direction.

Added the one from sundanese community here, @Ilham.nurwansah .

Thank you @Joseagush for adding me into this thread. The Balinese-Latin converter indeed so usefull! Thanks also to @kamholz for bringing the source-code here. I will adapt it for Latin-Sundanese transliteration.

kamholz added a comment.EditedWed, Feb 3, 4:06 AM

Thank you @Joseagush for adding me into this thread. The Balinese-Latin converter indeed so usefull! Thanks also to @kamholz for bringing the source-code here. I will adapt it for Latin-Sundanese transliteration.

Great, let me know if you need any help! The Balinese rules I linked to are from lines 835 to 1008. You can paste the rules into the tester I made and experiment with getting them to work for Sundanese. If the rules work in two directions, one will be "forward" (the default order) and the other will be "reverse". For example, a rule like this:

a <> b;

means to change "a" to "b" in the forward direction and "b" to "a" in the reverse direction. You can test the reverse direction by checking the Reverse box on the tester. It doesn't really matter which direction is forward or reverse, so you can choose whatever is convenient for you. You can also write separate code for each direction instead of doing both in the same rule set.

Also, you might like to know that I made another recent patch adding a <langconvert> tag (T263082). This lets you write Wikitext like this:

<langconvert from="ban-Bali" to="ban">ᬯᬬᬦ᭄</langconvert>

and the output will be:

wayan·