Page MenuHomePhabricator

User:Unknown_user editing in dewiki
Closed, ResolvedPublic

Description

Page: https://de.wikipedia.org/w/index.php?title=Benutzer:Delta456/Carcassonne_(Spiel)/Eigenst%C3%A4ndige_Spiele

Case:

  • Edit of new page Benutzer:Delta456/Carcassonne_(Spiel)/Eigenst%C3%A4ndige_Spiele by
    • 2017-02-18T15:44:04‎ TaxonBot‎ . . (162.779 Bytes) (+162.779)‎ . . (Bot: Vorbereitung zur Auslagerung)
  • User:Unknown_user emptied the page by
    • 2017-02-18T15:46:49‎ Unknown user . . (leer) (-162.779)

further import edits have been emptied by User:Unknown_user, please check the revisions

Note: the first revision was not an import edit but a normal bot edit with bot flag

I let the last revision untouched now to check'em by yourselves.
Please explain, who the user:Unknown_user is ...

Thank you ...

Event Timeline

These come from imports

https://github.com/wikimedia/mediawiki/blob/master/includes/import/WikiImporter.php#L704
https://github.com/wikimedia/mediawiki/blob/master/includes/import/WikiImporter.php#L901

[16:54:33] <MatmaRex> Reedy: heh. we do import edits with "Unknown user" as username in some cases. perhaps the real username was revdeleted.

Change 338538 had a related patch set uploaded (by Bartosz Dziewoński):
Add "Unknown user" to $wgReservedUsernames

https://gerrit.wikimedia.org/r/338538

This seems to be the correct and intentional behavior. It was implemented per T121338: Better handling of revision deletion on import (prior to those fixes, revisions where the author was revdeleted were apparently skipped completely when importing, which was worse). My patch above only ensures that it's impossible to actually register that username and make real edits using it.

You can document its purpose on its user page, like it was done e.g. for https://de.wikipedia.org/wiki/Benutzer:Maintenance_script.

Change 338539 had a related patch set uploaded (by Reedy):
Add "Unknown user" to $wgReservedUsernames

https://gerrit.wikimedia.org/r/338539

Change 338538 merged by jenkins-bot:
Add "Unknown user" to $wgReservedUsernames

https://gerrit.wikimedia.org/r/338538

Change 338539 merged by jenkins-bot:
Add "Unknown user" to $wgReservedUsernames

https://gerrit.wikimedia.org/r/338539

matmarex claimed this task.
matmarex removed a project: Patch-For-Review.

Please note: the first was !NOT! an import revision !!! emptied by User:Unknown_user too.

Please note: the first was !NOT! an import revision !!! emptied by User:Unknown_user too.

How do you know? The first what?

What makes you think it was not an imported revision? They look imported to me.

Perhaps there is some bug that causes edits to be imported with the wrong timestamp. Can you explain how exactly the import was done?

I reopened this task because there is something wrong: there was not a single import edit at all by User:Unknown_User but you mentioned this above. This user only made emptying edits, not any more at all. Please check the details of this case more in depth.

Additionally about every 10th import fails, but the return says: import successful. This has to be debugged, too

Look the two case lines in the task description. These was done !before! the import! One bot flagged post edit and one emptying edit. The import case were the following revisions only

All of the "edits" by "Unknown user" are a result of importing a revision with no author, no timestamp and no content. I would like to know what you did to import them.

No, you're wrong. Sorry. I have made it myself, i know what I've done

If you imported a hand-crafted XML file, it would be helpful to see it. Probably there should've been an error message, rather than allowing you to do this.

@matmarex: I imported it by an exported XML file.

Please check the logbook: https://de.wikipedia.org/w/index.php?title=Spezial:Logbuch&page=Benutzer%3ADelta456%2FCarcassonne+%28Spiel%29%2FEigenst%C3%A4ndige+Spiele
the first logbook entry is an import log with time stamp: 2017-02-18T15:46:50 <- correct

  1. But the first edit of the page at all was before the import: 2017-02-18T15:44:04
  2. Then the second edit of the page was an emptying by User:Unknown_user, before the import too, at 2017-02-18T15:46:49‎
  3. The following third log is the import log then at 2017-02-18T15:46:50, an import upload of an exported (Special:Export) xml file

That shows me, that User:Unknown_user emptied the page one second before the import. The next edits of Unknown user were import reverts, but not the first!

In the interest of verifying that there is no security issue, can you please provide the XML file you imported? I suspect importing 849 revisions simply took a second to complete.

Let me try the import once more to another target and we'll see.

I have already explained what is happening. You are importing a malformed file. If you uploaded it, I would have pointed out where the issue is. Until you upload that file, I have nothing else to say to you.

Thanks. Are you sure this is the right file? There are 1034 revisions in it, but you imported 849 revisions on dewiki.

If this is the right file, I suspect it might have gotten corrupted/truncated during the upload. Special:Import apparently doesn't complain about truncated XML files (I tested locally), and the latest imported revision has a timestamp of 10:52, 3. Jan. 2016‎ – the original page has a lot of later edits.

All these deletions by "Unknown User" is caused by the import itself due to unknown reason maybe due to an corrupted import file. The deletion resets the imported content and is always in front of the import log entry.

I imported http://tools.wmflabs.org/taxonbot/xml1.xml to testwiki: https://test.wikipedia.org/w/index.php?title=Test_namespace_2:Carcassonne_(Spiel)&action=history

You can see that the top imported revision has the correct user name (Aka) and timestamp (2016-01-03T12:32:32Z) but then the comment and content are missing.

Looking at the relevant <revision> tag in the XML file, something caught my eye:

<comment>/* Carcassonne Demo-Spiel */ Tippfehler entfernt  🔧&amp;nbsp;</comment>

The spanner character is an emoji (U+1F527), but the UTF-8 encoding is suspect. It is encoded as ed a0 bd ed b4 a7, whereas it should be encoded as f0 9f 94 a7.

https://de.wikipedia.org/wiki/Special:Export/Carcassonne_(Spiel)?history=1 looks like it contains the emoji with correct UTF-8 encoding, so there is an issue somewhere else along the line.

Interestingly, even though http://tools.wmflabs.org/taxonbot/xml1.xml is an export of a page on dewiki, the <siteinfo> tag at the top of the dump is clearly from enwiki. I wonder what is going on there? @doctaxon, could you please tell us how exactly you created this XML file?

I wonder if we can make the import process more resilient to dodgy UTF-8 sequences (replace them with U+FFFD perhaps). This might not be possible if the relevant code is hard-baked into the internals of the PHP/HHVM's XML DOM extension.

Hi! I could find this emoji too, yesterday, but it was very late, so I couldn't report it any more.

I have deleted this one corrupted revision, and the import has been running without any problems.
The xml export is a build of 1034 single revisions, it never was necessary to change the siteinfo tag though the export can be from a wiki of any language.

It looks like the revisions dump has to be encoded another way. I had such a problem the first time of all. I think the problem is solved. Thank you ...

My patch above only ensures that it's impossible to actually register that username and make real edits using it.

You can document its purpose on its user page, like it was done e.g. for https://de.wikipedia.org/wiki/Benutzer:Maintenance_script.

@matmarex, I added the suggested documentation to user:Unknown_user on EnWiki. However in the process I noticed that the User_talk existed, and that this account already existed! It appears to have been created by an experienced user in 2004, used for a few days, and abandoned. It has 126 edits. The account was not_used/doesn't_exist at any other wikis.

The user name should never have been used for a system purpose without first checking whether it existed :/ Adding the name to $wgReservedUsernames should ensure that it can no longer be logged into, but you should verify whether this situation causes any other messes. Perhaps the EnWiki account should receive a renaming, to avoid any conflicts or confusion in the future?

The user name should never have been used for a system purpose without first checking whether it existed :/

To be pedantic, I don't think this issue would have blocked the core change, but arguably a check should have been carried out before the core change was deployed to WMF sites. It's a good point though.

It would be interesting to see whether the system allows renaming of that account. It might require the SUL to be disconnected by stewards and a local rename performed.