Page MenuHomePhabricator

Add more character mappings to AntiSpoof
Closed, ResolvedPublic1 Estimate Story Points

Description

Currently, Antispoof maps ك to ک.

It should also map the following:

ڪ
ك

This will allow ccnorm() to be used to capture all of these using AbuseFilter, instead of using hard-to-read regex patterns like [کكڪﻙﻚ].

Also: The Cyrillic letter Д д (Д д) should be added to the AntiSpoof equivset for A

Details

Related Gerrit Patches:
mediawiki/extensions/AntiSpoof : masterAdd more Persian characeter mappings to AntiSpoof

Event Timeline

Huji created this task.Aug 21 2017, 10:30 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 21 2017, 10:30 AM
Huji claimed this task.Aug 21 2017, 10:30 AM
Huji triaged this task as Low priority.
Huji updated the task description. (Show Details)
Huji added a subscriber: Yamaha5.
Yamaha5 added a comment.EditedAug 21 2017, 11:11 AM

for Persian and Arabic:
here is listed all arabic family characters.
I check the table plus numbers there are some other similar characters which have different Unicode:

ۀ = \u06C0
ۂ =\u06C2
هٔ = \u0647 + \u0654

إ =\u0625
ٳ =\u0673

ٲ =\u0672
أ =\u0623
ٵ =\u0675

، =\u060C
٬ =\u066C
٫ =\u066B

064E
0659

ڼ =\u06BC
ڹ=\u06B9

06EC
06E0
06F0
0660
06DF
06EB
06EA
. = (dot)

0674
0655
0654
065F
0621

٭ =\u066D

  • = *

Persian's number's shape mostly the same as Arabic's but their Unicode is different!
Persian numbers= ۹ ۸ ۷ ۶ ۵ ۴ ۳ ۲ ۱ ۰
Arabic numbers = ٠ ١ ٢ ٣ ٤ ٥ ٦ ٧ ٨ ٩
Persian numbers' Unicode= \u06F9 \u06F8 \u06F7 \u06F6 \u06F5 \u06F4 \u06F3 \u06F2 \u06F1 \u06F0
Arabic numbers' Unicode =\u0660 \u0661 \u0662 \u0663 \u0664 \u0665 \u0666 \u0667 \u0668 \u0669
you can check them here

Change 373596 had a related patch set uploaded (by Huji; owner: Huji):
[mediawiki/extensions/AntiSpoof@master] Add more Persian characeter mappings to AntiSpoof

https://gerrit.wikimedia.org/r/373596

Huji added a comment.Aug 24 2017, 4:22 PM

Thanks @Yamaha5 I will create one or more separate patches for those groups as well.

TBolliger set the point value for this task to 1.Sep 8 2017, 7:40 PM
TBolliger renamed this task from Add more Persian character mappings to AntiSpoof to Add more character mappings to AntiSpoof.Sep 8 2017, 7:59 PM
TBolliger updated the task description. (Show Details)

Merging in a near-identical ticket. We'll tackle both at the same time.

Change 373596 merged by jenkins-bot:
[mediawiki/extensions/AntiSpoof@master] Add more Persian characeter mappings to AntiSpoof

https://gerrit.wikimedia.org/r/373596

dmaza closed this task as Resolved.Sep 28 2017, 9:57 PM
dmaza moved this task from Code Review to Done on the Anti-Harassment (AHT Sprint 6) board.