Page MenuHomePhabricator

Required new attribute to selectively exclude unicode text
Open, Needs TriagePublicFeature

Description

Currently we have an LTA who likes to create user accounts or pages with insulting terms, and will do it in some of the varieties of unicode text. Using the attribute <antispoof> is too harsh as many of the terms one wishes to allow in normal characters, though definitely don't need in unicode.

What would be useful is to have an attribute that allows the simpler (regular?) versions of the text through though is capable of leveraging anti-spoof to block the textual equivalents, and for the purposes of this ticket I am calling <unicodeonly>. I was thinking of something acting similarly forbidden method as the other <...only> attributes.

This methodology can then map like characters outside of the normal code characters, and as new unicode sets are developed could be similarly applied as we upgrade code sets, and not have new regex filter lines created

Examples

current style

.*(?:๐“ฑ๐“พ๐“ฐ๐“ฎ|๐“ถ๐“พ๐“ผ๐“ฝ|๐“ญ๐“ฒ๐“ฎ|๐“ฌ๐“ธ๐“ฌ๐“ด).*
.*(?:๐•’๐•ค๐•๐•–๐•ช|๐“ช๐“ผ๐“ต๐“ฎ๐”‚)                      <newaccountonly>

expected style

.*(?:huge|must|die|cock).*     <unicodeonly>
.*asley.*                      <newaccountonly|unicodeonly>

I would still expect that use of the local mediawiki:titlewhitelist would override the restriction of this new attribute.

I am guessing that there is work in both the titleblacklist and antispoof extensions to get the feature, and the groupings in place.

Event Timeline

Restricted Application added a subscriber: Aklapper. ยท View Herald TranscriptMay 22 2020, 2:07 AM
DannyS712 changed the subtype of this task from "Task" to "Feature Request".
DannyS712 awarded a token.
DannyS712 removed a subscriber: Stewards-and-global-tools.
DannyS712 added a subscriber: DannyS712.

I wonder why do we allow these kinds of "weird" fonts to register accounts in the first place. Those is being used to actively circunvent anti abuse features such as the title blacklists and abuse filters; and give all sort of troubles to wiki administrators. If blacklisting unicode usernames is too harsh, having a <unicode> tag sounds like a good intermediate step between allowing all or deny all.

Specially problematic characters could be added to https://www.mediawiki.org/wiki/Manual:$wgInvalidUsernameCharacters in our CommonSettings.php I guess, too.

Specially problematic characters could be added to https://www.mediawiki.org/wiki/Manual:$wgInvalidUsernameCharacters in our CommonSettings.php I guess, too.

I feel like the argument above is more towards defining a list of valid characters, rather than having to continuously maintain a list of invalid characters (which would change over time, as new Unicode characters are added).

Another example of requirement for people spoofing IP addresses

https://meta.wikimedia.org/wiki/special:permalink/20408702#Fix pseudo_IPs_rule