Required new attribute to selectively exclude unicode text
Open, Needs TriagePublicFeature
Actions

Assigned To

None

Authored By

	Billinghurst
	May 22 2020, 2:07 AM

Description

Currently we have an LTA who likes to create user accounts or pages with insulting terms, and will do it in some of the varieties of unicode text. Using the attribute <antispoof> is too harsh as many of the terms one wishes to allow in normal characters, though definitely don't need in unicode.

What would be useful is to have an attribute that allows the simpler (regular?) versions of the text through though is capable of leveraging anti-spoof to block the textual equivalents, and for the purposes of this ticket I am calling <unicodeonly>. I was thinking of something acting similarly forbidden method as the other <...only> attributes.

This methodology can then map like characters outside of the normal code characters, and as new unicode sets are developed could be similarly applied as we upgrade code sets, and not have new regex filter lines created

Examples

current style

.*(?:𝓱𝓾𝓰𝓮|𝓶𝓾𝓼𝓽|𝓭𝓲𝓮|𝓬𝓸𝓬𝓴).*
.*(?:𝕒𝕤𝕝𝕖𝕪|𝓪𝓼𝓵𝓮𝔂)                      <newaccountonly>

expected style

.*(?:huge|must|die|cock).*     <unicodeonly>
.*asley.*                      <newaccountonly|unicodeonly>

I would still expect that use of the local mediawiki:titlewhitelist would override the restriction of this new attribute.

I am guessing that there is work in both the titleblacklist and antispoof extensions to get the feature, and the groupings in place.

Related Objects

Mentioned In: T246353: Investigate and mitigate trivial bypass to AntiSpoof

Event Timeline

Billinghurst created this task.May 22 2020, 2:07 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 22 2020, 2:07 AM

DannyS712 added a project: Stewards-and-global-tools.May 22 2020, 2:15 AM

DannyS712 changed the subtype of this task from "Task" to "Feature Request".

DannyS712 awarded a token.

DannyS712 removed a subscriber: Stewards-and-global-tools.

DannyS712 subscribed.

CptViraj subscribed.May 24 2020, 1:58 PM

I wonder why do we allow these kinds of "weird" fonts to register accounts in the first place. Those is being used to actively circunvent anti abuse features such as the title blacklists and abuse filters; and give all sort of troubles to wiki administrators. If blacklisting unicode usernames is too harsh, having a <unicode> tag sounds like a good intermediate step between allowing all or deny all.

Specially problematic characters could be added to https://www.mediawiki.org/wiki/Manual:$wgInvalidUsernameCharacters in our CommonSettings.php I guess, too.

Billinghurst mentioned this in T246353: Investigate and mitigate trivial bypass to AntiSpoof.Jun 11 2020, 2:10 PM

should this be tagged to Equivset ?

In T253367#6162022, @MarcoAurelio wrote:

Specially problematic characters could be added to https://www.mediawiki.org/wiki/Manual:$wgInvalidUsernameCharacters in our CommonSettings.php I guess, too.

I feel like the argument above is more towards defining a list of valid characters, rather than having to continuously maintain a list of invalid characters (which would change over time, as new Unicode characters are added).

Another example of requirement for people spoofing IP addresses

https://meta.wikimedia.org/wiki/special:permalink/20408702#Fix pseudo_IPs_rule

Required new attribute to selectively exclude unicode textOpen, Needs TriagePublicFeatureActions

Description

Related Objects

Event Timeline

Required new attribute to selectively exclude unicode text
Open, Needs TriagePublicFeature
Actions