Page MenuHomePhabricator

Request amendment of Bangla character set in WikiEditor
Closed, ResolvedPublic1 Story Points

Description

The Bangla character set is incomplete. Not only that, alternate character forms (for R and W) used in Assamese and Manipuri languages should also be there. I suggest incorporating the English Wikisource CharInsert set for Bengali characters. (available at https://en.wikisource.org/wiki/MediaWiki:Gadget-charinsert-core.js)

Related: T91608

Event Timeline

Hrishikes updated the task description. (Show Details)
Hrishikes raised the priority of this task from to Normal.
Hrishikes added a subscriber: Hrishikes.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 19 2015, 2:17 AM
GOIII renamed this task from Reopening T91608 for amendment of Bangla character set. to Request amendment of Bangla character set.Apr 19 2015, 2:42 AM
GOIII updated the task description. (Show Details)
GOIII set Security to None.
GOIII added subscribers: Krenair, GOIII.
AuFCL added a subscriber: AuFCL.Apr 19 2015, 3:09 AM
Nnemo added a subscriber: Nnemo.Apr 19 2015, 6:29 AM

I checked the Bengali Wikipedia WikiEditor character set and all the characters are not usable. I can click on some of the characters but not all. Even the English Wikisource CharInsert set are not complete, can you please tell me where can i update the set?

Just added Bengali ganda mark ৻ on the proposed list

I'm confused. Is this about adding characters into a gadget, or into the hard-coded list which WikiEditor pulls from?

It is about the WikiEditor. The proposed list is on the gadget talk page. The gadget in enWS is already updated (except later additions to the proposed list). This proposal is about the global set, acroos wiki projects.

Aklapper renamed this task from Request amendment of Bangla character set to Request amendment of Bangla character set in WikiEditor.Apr 20 2015, 9:33 AM
GOIII added a comment.EditedApr 20 2015, 9:40 PM

Not 100% sure but I believe the current list found at the bottom of the talk page is the desired set to replace the current one. I wish somebody Bengla knowledgeable would just add it here already (or attach a file with the desired list even).

FWIW... nobody used an 'edit-protected' notification template so I wasn't aware of the latest revisions on said talk page until today and will amend the Gadget's list to reflect the talk page some point later today. Done.

This comment was removed by GOIII.
GOIII added a comment.EditedApr 21 2015, 12:56 AM

Upload amended Bengla character set desired in .txt file format...

Could you submit a patch against MW core's resources/src/mediawiki.language/specialcharacters.json please?

GOIII added a comment.Apr 21 2015, 2:04 AM

I don't believe anyone currently following this here on Phab or back on Wikisource knows exactly how to submit a patch but let that request linger a bit longer in case somebody actual does.

P.S. I've been advised by those familiar with the Language that adding the [currently omitted in the above uploaded .txt file] zero-width-non-joiner character is also needed to properly fill out the Bengla language set

AuFCL added a comment.EditedApr 21 2015, 2:09 AM

Could you submit a patch against MW core's resources/src/mediawiki.language/specialcharacters.json please?

Just for clarification: you are asking for a patch to a single-line json specification. Is that not exactly the same as specifying a complete replacement file for existing https://git.wikimedia.org/blob/mediawiki%2Fcore/6a9428babd1cd78b80fc6cca92e495cd444fc7d3/resources%2Fsrc%2Fmediawiki.language%2Fspecialcharacters.json? If so should be pretty easy to deliver just as soon as the language experts approve a final version (there is currently a bit of last-minute hesitation regarding ‌ browser support.)

Thanks to GOIII for adding anji (ঀ). Now the addition of ‌ will make the list final.

I have discussed the zwnj compatibility issue with GOIII. Awaiting his opinion. If he finds it compatible as per my suggestion, then the issue can be considered as settled and we can go forward from there.

GOIII added a comment.EditedApr 21 2015, 2:22 AM

. . . If so should be pretty easy to deliver just as soon as the language experts approve a final version (there is currently a bit of last-minute hesitation regarding ‌ browser support.)

I don't think that is an issue - I only raised 'compatibility' elsewhere out of curiosity more so than caution, etc.

If   is OK to host, so should ‌

and @Hrishikes ... maybe adding the reasoning and examples for ‌ inclusion you put on my talk page would go a long way in justifying its addition to the core resource as well.

See the examples মিস্মি (without zwnj) and মিস্‌মি (with zwnj). If you can visually appreciate the difference, that should be the justification.

AuFCL added a comment.Apr 21 2015, 2:36 AM

See the examples মিস্মি (without zwnj) and মিস্‌মি (with zwnj). If you can visually appreciate the difference, that should be the justification.

Sample works for me. Just remember @Krenair apparently wants the patch submitted in JSON form (so javascript tricks and HTML entities cannot be used. Everything needs to be expanded into final-form characters.)

The character zwnj is invisible in isolation. However it is present near the end of Vrinda character set in MS Windows 7 character map

Sample works for me. Just remember @Krenair apparently wants the patch submitted in JSON form (so javascript tricks and HTML entities cannot be used. Everything needs to be expanded into final-form characters.)

@AuFCL I don't have the technical expertise for this JSON business. Someone else needs to do it.

@Nasirkhan can you help with the JSON format? List now available at the enWS CharInsert gadget and not on its talk page.

AuFCL added a comment.Apr 21 2015, 4:42 AM

@Hrishikes, @Nasirkhan: What follows is my best attempt for mashing everything together in what (I hope) is the correct format. I have no way of even validating whether the result is legal JSON so please clean up any errors you spot! prototype specialcharacters.json

@Hrishikes, @Nasirkhan: What follows is my best attempt for mashing everything together in what (I hope) is the correct format. I have no way of even validating whether the result is legal JSON so please clean up any errors you spot! prototype specialcharacters.json

Bengali characters are OK. So this should be taken as the final list and action taken thereon. I am grateful for all this help from everyone.

Added 2 characters and changed the order at https://en.wikisource.org/wiki/User:AuFCL/specialcharacters.json and i think there are no change needed

AuFCL added a comment.Apr 21 2015, 7:58 AM

Added 2 characters and changed the order at https://en.wikisource.org/wiki/User:AuFCL/specialcharacters.json and i think there are no change needed

"ৗ" and zwj, right? If that is the case unfortunately the syntax is wrong for ‍ (I have fixed this last.)

General comment: Cutting and pasting non-printing characters where the results are invisible is driving me crazy so I've filled in the Unicode for zwj (‍) and zwnj (‌) numerically.

(Somebody fix this last if it causes future problems, please.)

Ah, I see that you have already noticed the syntax problem. I had already pinged you not knowing that you have noticed. I am no expert, but I noticed that the syntax differs for Arabic. Thanks for the fix.

@AuFCL If the font does not matter, the characters zwj and zwnj can be copy-pasted from the Arabic portion, is it not?

Fixed the characters. Not sure about whatever AuFCL meant by legality or validity of JSON format. Language part is over, however.

This comment was removed by Hrishikes.

Submitting json file. @Krenair @GOIII @AuFCL please check legality/validity of the format.

GOIII added a comment.Apr 22 2015, 8:09 AM

@Hrishikes

There is an end-comma missing for the character right before the opening [ for ZWS.

Whenever in doubt, use the online JSON verification tool - http://jsonlint.com/

@Hrishikes

There is an end-comma missing for the character right before the opening [ for ZWS.

Whenever in doubt, use the online JSON verification tool - http://jsonlint.com/

@GOIII As per your link, the json is valid except that comma. I have made correction to @AuFCL's json page, I shall update the file here (may take some time as my computer connection is down; editing here with android).

Updated json file:

At jsonlint, the content is shown as valid when pasted there, but the file shows

Parse error on line 1:

^
Expecting '{', '['

when url is pasted.

Suggestions?

Patch for Bangla character set as under:

"bangla":["ঀ","অ","আ","ই","ঈ","উ","ঊ","ঋ","ঌ","এ","ঐ","ও","ঔ","া","ি","ী","ু","ূ","ৃ","ে","ৈ","ো","ৌ","্য","্র","ক","খ","গ","ঘ","ঙ","চ","ছ","জ","ঝ","ঞ","ট","ঠ","ড","ঢ","ণ","ত","থ","দ","ধ","ন","প","ফ","ব","ভ","ম","য","র","ল","শ","ষ","স","হ","ড়","ঢ়","য়","ৎ","ং","ঃ","ঁ","্","৷","॥","১","২","৩","৪","৫","৬","৭","৮","৯","০","ঽ","ৗ","়","ৰ","ৱ","৲","৻","৳","৴","৵","৶","৷","৸","৹","৺","ৠ","ৡ","ৄ","ৢ","ৣ","‘","’","“","”",["zws","​"],["zwnj","‌"],["zwj","‍"]],

Change 205880 had a related patch set uploaded (by Jforrester):
mediawiki.language: Update bangla characters per request

https://gerrit.wikimedia.org/r/205880

@Nasirkhan, @Hrishikes: I've updated the special characters list in the above patch, but I can't read Bangla so I'm not absolutely sure it's right. Can you confirm? (Sorry it's a bit of a mess of a file.)

Bengali characters are confirmed. Present at http://unicode.org/charts/PDF/U0980.pdf

Jdforrester-WMF closed this task as Resolved.Apr 22 2015, 7:14 PM
Jdforrester-WMF edited a custom field.
Jdforrester-WMF moved this task from Nominated to Done on the VisualEditor 2014/15 Q4 blockers board.

Change 205880 merged by jenkins-bot:
mediawiki.language: Update bangla characters per request

https://gerrit.wikimedia.org/r/205880

GOIII moved this task from Backlog to Closed on the WikiEditor board.Apr 3 2016, 9:08 AM
AuFCL removed a subscriber: AuFCL.Nov 21 2016, 7:37 PM