Page MenuHomePhabricator

Around 1% of usernames lack valid userid (is 0 instead)
Closed, InvalidPublic

Description

users with most revisions with 0 userid per wiki (+ overall counts per wiki)

The issue surfaced when edits per wiki per month per namespace were merged into one file for all 800 wikis.

Lines in merged editor file: 55351997
Lines with user id 0: 603446 (1.1% of total).

Often one user has revisions with userid 0 and revisions with the proper userid in same stub dump. In some wikis hundreds of user have revisions with userid 0.

E.g.check historic stub dump for mediawikiwiki :
http://dumps.wikimedia.org/mediawikiwiki/20140605/mediawikiwiki-20140605-stub-meta-history.xml.gz

First field after user name is userid, for Erik Zachte sometimes 20226, sometimes 0. I checked the actual dump content: it is in the dumps.

Erik Zachte 0 2004-01 wx mediawiki 100 3
Erik Zachte 0 2004-03 wx mediawiki 0 5
Erik Zachte 20226 2004-03 wx mediawiki 102 10
Erik Zachte 20226 2004-04 wx mediawiki 0 2
Erik Zachte 20226 2004-04 wx mediawiki 102 1
Erik Zachte 0 2004-05 wx mediawiki 0 3
Erik Zachte 20226 2004-05 wx mediawiki 102 1
Erik Zachte 20226 2004-06 wx mediawiki 102 25
Erik Zachte 20226 2004-06 wx mediawiki 103 7
Erik Zachte 0 2004-07 wx mediawiki 0 1
Erik Zachte 20226 2004-07 wx mediawiki 102 10

See also https://trello.com/c/3ecjp9aM/237-master-monthly-editor-activity-data


Version: unspecified
Severity: normal

Attached:

Details

Reference
bz66676

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 3:15 AM
bzimport set Reference to bz66676.

This can happen for edits imported by Special:Import prior to the user creating an account on the wiki. Although 1% seems much higher rate then I would expect for that situation.

We dump whatever is in the database; the revision table would need to be updated to reflect the correct user id. Example:

mysql:root@localhost [mediawikiwiki]> select * from revision where rev_id = 52656;
+--------+----------+-------------+-------------+----------+---------------+----------------+----------------+-------------+---------+---------------+---------------------------------+-------------------+--------------------+

rev_idrev_pagerev_text_idrev_commentrev_userrev_user_textrev_timestamprev_minor_editrev_deletedrev_lenrev_parent_idrev_sha1rev_content_modelrev_content_format

+--------+----------+-------------+-------------+----------+---------------+----------------+----------------+-------------+---------+---------------+---------------------------------+-------------------+--------------------+

52656572551761=Timelines=0Erik Zachte2004051601433600845052655teebwxnn4bbatoritq55vv8zj4glqpeNULLNULL

+--------+----------+-------------+-------------+----------+---------------+----------------+----------------+-------------+---------+---------------+---------------------------------+-------------------+--------------------+

So we (mediawiki dev + dba + me?) should figure out a plan for fixing these up. Adding Springle first off...

ArielGlenn set Security to None.Feb 16 2015, 12:07 PM
ArielGlenn added a subscriber: Springle.
Aklapper triaged this task as Low priority.Mar 23 2015, 5:42 PM
Aklapper added a subscriber: Aklapper.

So we (mediawiki dev + dba + me?) should figure out a plan for fixing these up. Adding @Springle first off...

How to start figuring out that plan? :)

Krenair closed this task as Invalid.Aug 17 2015, 5:09 AM
Krenair added a subscriber: Krenair.
MariaDB [mediawikiwiki_p]> select count(*) from revision, user where rev_user_text = user_name and rev_user = 0;
+----------+
| count(*) |
+----------+
|     4601 |
+----------+
1 row in set (2.76 sec)

MariaDB [mediawikiwiki_p]> select count(*) from revision;
+----------+
| count(*) |
+----------+
|  1606297 |
+----------+
1 row in set (0.52 sec)

Honestly these numbers seem reasonable to me. The only place I found rev_user=0 and rev_user_text="Erik Zachte" was from imported revisions, which is expected behaviour.