Page MenuHomePhabricator

Further extensions to ccnorm
Open, Needs TriagePublic

Description

Continuing T27619 (some of the following characters have already been mentioned there), and (partly) driven by recent/previous spoofs, I'd like to request the following additions to ccnorm:

A: Ꭿꭿẚ
B: ḂḃḄḅḆḇ
C: ᏣꮳḈḉㄷ
D: ᎠꭰḊḋḎḏḐḑḒḓ
E: ㅌᏋꮛ𐐁𐐩Ḕḕ
F: Ḟḟ
G: Ᏽᏽ
H: ㅐẖ𐞖ɧ
I: ㅣᏆꮖḬḭḮḯỈỉỊị
J:
K: ḰḱḲḳḴḵ
L: ㄴḺḻḼḽ𐐛𐑃
M: ㅆ
N: ṄṅṈṉṊṋ
O: ㅇṌṍṎṏṐṑṒṓỎỏỐốỒồỔổỖỗỘộỚớỜờỞởỠỡỢợ𐐄𐐬
P: 尸
Q:
R: 𐞪
S: ᎦꭶṠṡṤṥṦṧṨṩꚂꚃㄹ
T: ᎢꭲㅜㄒṪṫṮṯṰṱẗꚌꚍ
U: ṲṳṴṵṶṷṸṹṺṻỤụỦủỨứỪừỬửỮữỰự
V: ṼṽṾṿ
W: ẘ
X: ẊẋẌẍ
Y: Ỿỿ
Z: ʑẐẑẒẓẔẕ

Related Objects

Event Timeline

Change 1004321 had a related patch set uploaded (by Umherirrender; author: Umherirrender):

[mediawiki/libs/Equivset@master] Add remaining letters from "Latin Extended Additional" unicode block

https://gerrit.wikimedia.org/r/1004321

Antispoof detects different scripts, not sure if needs a mapping for A in that case. It seems AbuseFilter does not have this feature.

The patch set above added some characters of the list (and a bit more), so some characters still needs investigate if needed.

A: Ꭿꭿ
C: Ꮳꮳㄷ
D: Ꭰꭰ
E: ㅌᏋꮛ𐐁𐐩
G: Ᏽᏽ
H: ㅐ𐞖ɧ
I: ㅣᏆꮖ
L: ㄴ𐐛𐑃
M: ㅆ
O: ㅇ𐐄𐐬
P: 尸
R: 𐞪
S: ᎦꭶꚂꚃㄹ
T: ᎢꭲㅜㄒꚌꚍ
Z: ʑ

Change 1004321 merged by jenkins-bot:

[mediawiki/libs/Equivset@master] Add remaining letters from "Latin Extended Additional" unicode block

https://gerrit.wikimedia.org/r/1004321

Change 1005587 had a related patch set uploaded (by Umherirrender; author: Umherirrender):

[mediawiki/libs/Equivset@master] Add remaining letters from "IPA Extensions" unicode block

https://gerrit.wikimedia.org/r/1005587

Change 1005587 merged by jenkins-bot:

[mediawiki/libs/Equivset@master] Add remaining letters from "IPA Extensions" unicode block

https://gerrit.wikimedia.org/r/1005587

@Umherirrender I'm not familiar with antispoof, so I requested this mainly for AbuseFilter, see cases like https://en.wikipedia.org/wiki/Special:Contributions/ClearHarmony. AbuseFilter already normalises a lot of Cyrillic letters to Latin ones, so I don't really see the problem here.

Change #1015271 had a related patch set uploaded (by Thiemo Kreuz (WMDE); author: Thiemo Kreuz (WMDE)):

[mediawiki/libs/Equivset@master] Map some very obvious Hangul Jamo look-alikes to ASCII

https://gerrit.wikimedia.org/r/1015271

Change #1017039 had a related patch set uploaded (by Thiemo Kreuz (WMDE); author: Thiemo Kreuz (WMDE)):

[mediawiki/libs/Equivset@master] Add a few more look-alikes from Cherokee & Canadian ranges

https://gerrit.wikimedia.org/r/1017039

Change #1015271 merged by jenkins-bot:

[mediawiki/libs/Equivset@master] Map some very obvious Hangul Jamo look-alikes to ASCII

https://gerrit.wikimedia.org/r/1015271

Change #1017039 merged by jenkins-bot:

[mediawiki/libs/Equivset@master] Add a few more look-alikes from Cherokee & Canadian ranges

https://gerrit.wikimedia.org/r/1017039

Remains:

E: 𐐁𐐩
L: 𐐛𐑃
M: ㅆ
O: 𐐄𐐬
P: 尸
S: ㄹ
T: ㄒ

ㅆ and ㄹ are from Hangul Compatibility Jamo unicode block U+3146 / U+3139
ㄒ is from Bopomofo unicode block U+3112
尸 is from CJK Unified Ideograph unicode block U+5C38

𐐁𐐩𐐛𐑃𐐄𐐬 are from Deseret unicode block, this seems the best remaining candidats to map to ascii letters

Change #1017099 had a related patch set uploaded (by Umherirrender; author: Umherirrender):

[mediawiki/libs/Equivset@master] Add a few more look-alikes from Deseret range

https://gerrit.wikimedia.org/r/1017099

Change #1017099 merged by jenkins-bot:

[mediawiki/libs/Equivset@master] Add a few more look-alikes from Deseret range

https://gerrit.wikimedia.org/r/1017099

Change #1017101 had a related patch set uploaded (by Thiemo Kreuz (WMDE); author: Thiemo Kreuz (WMDE)):

[mediawiki/libs/Equivset@master] Add a few look-alikes from the Bopomofo ranges

https://gerrit.wikimedia.org/r/1017101

Change #1017101 merged by jenkins-bot:

[mediawiki/libs/Equivset@master] Add a few look-alikes from the Bopomofo ranges

https://gerrit.wikimedia.org/r/1017101