Page MenuHomePhabricator

Username of all whitespaces in German Wikipedia dump file
Closed, ResolvedPublic


Author: triddle

A username consisting of all spaces made its way into the German Wikipedia dump file. The article it
happened on is at

Since the username field is not marked as space-preserving Parse::MediaWikiDump completely ignored
its contents in this case. I have a feeling a username of all spaces is not supposed to be allowed to exist.


Version: unspecified
Severity: normal



Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 9:00 PM
bzimport set Reference to bz4312.
bzimport added a subscriber: Unknown Object (MLST).

gangleri wrote:


If you go
and click on the "space" link
you will come to
there to
no email specified or emails from other users disabeled

The problem is known since August see

The user name contains
Unicode Character 'NO-BREAK SPACE - U+00A0
HTML Entity (decimal)   (hex)   (named)  
UTF-8 (hex) 0xC2 0xA0 (c2a0) %c2%a0 %C2%A0
is known already from

Changing the name would be an administrative task either at WP:DE or better at
all projects. I do not know the policy about this. Please clarify this at the
local wiki, via a mailing list as [Wikide-l], [Wikitech-l] etc. or via IRC at
irc:// .

Marking this bug as a duplicate of
bug 1524: usernames should use unicode whitelist is mentioned at
bug 2173 comment 3
bug 2173: Fatal error when removing an article with an whitespace title from the

best regards reinhardt [[user:gangleri]]

*** This bug has been marked as a duplicate of 1524 ***

avarab wrote:

This isn't a duplicate of bug 1524, that deals with having a whitelist for
registered usernames, but this particular username also happens to break the XML

gangleri wrote:

Thanks Ævar! I did not read the second paragraph with the attention that would
be required. Please look what happens at

Please change the summary in order to reflect the new / major problem Thanks in

I don't understand, does this really break dumps?

Also wondering. How to exactly reproduce that it "breaks dumps"?

triddle wrote:

If the XML schema indicates data is not white space preserving then white space is not significant and there is no difference between " ", " ", " ", "\t\n\n\n\t\t\t\t\t\t\t\t\t \n\n]n" etc.

If a user name exists where white space is significant it becomes impossible to transmit using a non-space preserving data type. Thus it's not actually possible to get the user names correctly and this is rather broken.

FriedhelmW claimed this task.