Page MenuHomePhabricator

[[MediaWiki:Antispoof-name-illegal]] should provide information about the illegal and / or blocked characters in / at the wiki
Closed, ResolvedPublic

Description

Author: gangleri

Description:
Dear friends;

[[MediaWiki:Antispoof-name-illegal]] contains:

The name "$1" is not allowed to prevent confusing or spoofed usernames: $2. Please choose another name.

request:

The [[MediaWiki:Antispoof-name-illegal]] message should provide information about illegal and / or blocked characters.

Beside the characters from the Basic Latin block as controll characters in the range of � to  as well as the « % » % , « & » & , « + » &#2B; ¿etc.? characters this should include also all other UTF-8 characters as those from the General Punctuation block, the various whitespace characters etc.

Thanks for implementing this in advanced.

Best regards Reinhardt [[user:Gangleri]]


Version: unspecified
Severity: enhancement
URL: http://en.wikipedia.org/wiki/MediaWiki:Antispoof-name-illegal

Details

Reference
bz12500

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 10:00 PM
bzimport added a project: AntiSpoof.
bzimport set Reference to bz12500.
bzimport added a subscriber: Unknown Object (MLST).

gangleri wrote:

(In reply to comment #0)
...

Beside the characters from ...
... this should include also all other UTF-8 characters
as those from the General Punctuation block, the various whitespace characters
etc.

which are disallowed in all namespaces

an additional example is « � » which is
Unicode Character 'REPLACEMENT CHARACTER' - U+FFFD
http://www.fileformat.info/info/unicode/char/fffd/index.htm

van.de.bugger wrote:

Patch for better messages.

If username contains a bad (blacklisted, deprecated, unassigned) character, the error message includes the abusing character and its Unicode notation.

attachment AntiSpoof-better-messages.patch ignored as obsolete

sumanah wrote:

Added the "patch" and "need-review" keywords; Mark hopes to get someone to review the patch soon.

Looks good. Although rendering some codepoints could give layout problems (eg. a RTL mark).

van.de.bugger wrote:

Non-printable characters can be handled in badCharErr:

private static function badCharErr( $msgId, $point ) {

$char = codepointToUtf8( $point );
$code = sprintf( 'U+%04X', $point );
if ( preg_match( '/^[^:print:]/', $char ) ) {
    $char = '';
}
return array( "ERROR", wfMsg( $msgId, $char, $code ) );

}

Or:

private static function badCharErr( $msgId, $point ) {

$char = codepointToUtf8( $point );
$code = sprintf( 'U+%04X', $point );
if ( preg_match( '/^[^:print:]/', $char ) ) {
    return array( "ERROR", wfMsg( $msgId . '-np', $code ) );
}
return array( "ERROR", wfMsg( $msgId, $char, $code ) );

}

(The latter variant will require some more error messages.)

Which variant do you prefer? I will prepare a new patch.

Sorry, I hadn't seen it.
Instead of outputting empty backets (option 1) or having to duplicate all messages (option 2), I would use a single $1 in the messages:
'antispoof-unassigned' => 'Contains unassigned character $1',
and then have that $1 replaced by a couple of messages
one expanding to "$1" ($2) if it's printable or another which is just the U+XX form,

van.de.bugger wrote:

Next attempt.

'antispoof-unassigned' => 'Contains unassigned character $1',

That's is not very good decision, because in different languages it may be written in different ways. For example,

"$1" ($2)

(where $1 denotes character itself, $2 -- it's Unicode code) looks suitable for English language.

«$1» ($2)

would be preferred in Russian. Have no idea about languages with right-to-left direction… May be

($2) "$1"

?

So I implemented another approach. First, I format character representation from character (if it is printable) and character code:

if ( mb_ereg( '^[[:print:]]', $symbol ) ) {

$char = wfMsg( 'antispoof-bad-char', $symbol, $code );

} else {

$char = wfMsg( 'antispoof-bad-char-np', $code );

}

So it can be localized for different quotation marks, parenthesis, order, etc. Then, character representation is used to get final message:

return array( "ERROR", wfMsg( $msgId, $char ) );

It seems it works.

attachment AntiSpoof-better-error-reporting.patch ignored as obsolete

van.de.bugger wrote:

Hi, why do you think it is low priority?? It is usability issue. A user creating an account may be very frustrated when you just say "your login name contains unacceptable characters" but do not specify which exactly characters are unacceptable. Please consider it seriously.

I prepared the patch, the patch was not rejected — you said "looks good", but the patch was not applied. Now, 3 months later, the patch is outdated and cannot be applied cleanly. It require efforts (again!) to revive the patch. Why do you waste your and my time?

Is mb_ereg the only way to detect printability? Isn't that deprecated? (I don't see it explicitely marked as such, but I think it would be, as member of the ereg family). I think you can probably detect non-printable characters in PCRE with '/\p{C}/u'

Also, I'd call the message «antispoof-bad-char-non-printable» instead of the more cryptic «antispoof-bad-char-np»

I'm sorry you feel like that, you are obviously right in that this isn't a proper way to treat contributions. Note that I'm not payed for reviewing bugs (or doing any MW development), I just got more or less asssigned to it because I was the only one that took a look at it (and funnily, always after a ping comment). What can I say? I probably missed that it was replying me. Feel free to drop me a line reminder in a direct mail in the future. It's much less likely to get unnoticed as bugzilla mail.
I'd like to put up to date your patch myself, but I'm quite busy in real life now, and I don't think I could get time to dedicate for it.

van.de.bugger wrote:

Next attempt.

Hi, I am not payed for working on MW too, and also have some real life. I do not ask for any reward, but just want to be sure my (and your) time is not wasted. I do not ask for immediate response (I am completely ok to wait for a week or so), but longer periods are not so good — patches become outdated, I am loosing context and getting back to details requires more to time so on. So let us finish this tiny work reasonable quickly, ok?

Problems you mentioned last time:

  1. mb_ereg — replaced with preg_match( '/\A\p{C}\z/u', $symbol ).
  2. Cryptic message key — fixed.

Additionally:

  1. Added message documentation ("qqq" pseudo-language).
  2. Russian translation dropped, as requested here: http://www.mediawiki.org/wiki/Localization#Changing_existing_messages

Attached:

van.de.bugger wrote:

Ok, you do not have time to review the patch. Let me see if you have time to roll it back...

Committed r114156.

Let me see if you have time to roll it back...

That's not too kind of you.

The patch doesn't look bad, but I'll let someone else review it.
I'm happy to strike that mail from my inbox, though.

van.de.bugger wrote:

That's not too kind of you.

I am really sorry.

However, recently I have committed couple of my own extensions, and found there is a plenty of people reviewing my code and criticized:

  1. License (AGPLv3),
  2. Semicolon after {} block (it is not required),
  3. My comment at the end of file (it is useless),
  4. "Useless" declarations of global variables (from their point of view),

and other similar "defects" in my code. Some guys even fixed the code, and one fix really breaks it.

At the same time nobody has time to review this patch!

I am disappointed. I am thinking (1) there are other sites for hosting free projects, where the code will be under my control without annoying semicolon style maintainers, (2) forking AntiSpoof is not a big deal.

Sorry again.

sumanah wrote:

Van de Bugger:

I'm sorry for the wait. However, you know that in the long run it benefits everyone to work together in the same branch even though it can be frustrating in the short term; please don't fork.

Also, I encourage you to get a Git/Gerrit account https://labsconsole.wikimedia.org/wiki/Help:Access#Access_FAQ so that your code will be reviewed faster in the future. There are about 200 patches awaiting review right now in Bugzilla and moving more of those to the unified code review dashboard will help get reviews faster.