User edit counts (user.user_editcount field) is often wrong
OpenPublic

Description

Example user: [[User:Joao]]. On the Toolserver's copy of enwiki_p:

mysql> SELECT user.user_editcount FROM user WHERE user_name="Joao"\G

  • 1. row *******

user_editcount: 266
1 row in set (0.00 sec)

mysql> SELECT COUNT(*) FROM revision WHERE rev_user_text = "Joao" GROUP BY rev_user_text\G

  • 1. row *******

COUNT(*): 265
1 row in set (0.03 sec)

mysql> SELECT COUNT(*) FROM archive WHERE ar_user_text = "Joao" GROUP BY ar_user_text\G

  • 1. row *******

COUNT(*): 35
1 row in set (0.01 sec)

This isn't an anomaly. Many users, esp. users with higher edit counts, have inaccurate values stored. The values don't match the number of deleted or live contributions.

Part of the problem seems to stem from the fact that the initEditCount.php maintenance script doesn't account for deleted contributions.

We're currently advertising an edit count (in Special:Preferences and elsewhere) that isn't accurate.


Version: unspecified
Severity: normal

bzimport added a subscriber: wikibugs-l.
bzimport set Reference to bz19311.
MZMcBride created this task.Via LegacyJun 20 2009, 5:00 PM
APPER added a comment.Via ConduitAug 22 2009, 12:45 AM

I think there are at least two problem, which generate the difference between the "normal" edit counters on the toolserver and the user_editcount field. The first thing seems to be the problem with deleted edits. As the poster of this bug writes, the initEditCount.php doesn't account for deleted contributions. This is the correct behavior, as all edit counters don't count these. But after initializing user_editcount, only incEditCount() in User.php seems to be called, which increases user_editcount. But when a page is deleted, user_editcount is not decreased. So user_editcount is the number of all edits a user did (deleted and not deleted) minus all deleted edits up to the time, initEditCount() was called.

The second thing is an older bug, which results in having deleted revisions in the revisions table, which should be in the archive table. Therefore all edit counters check, if the rev_page id exists in the page table (this is from de.wikipedia):

SELECT count(*) FROM revision WHERE rev_user=10276;
-> 39702
SELECT count(*) FROM revision, page WHERE rev_user=10276 AND rev_page=page_id;
-> 39688

The 14 edits are from 2005/2006.
SELECT * FROM revision WHERE rev_user=10276 AND rev_page NOT IN(SELECT page_id FROM page);

I don't know if this bug exists anymore, but it doesn't seem so, because the last one for me was from March 2006. These were newly created redirects (mostly by moving a page), which were deleted later, but the moving message wasn't moved to archive. Because I think the bug was fixed, maybe a maintenance script would be good, moving all revisions with a rev_page id, which is not in the page table to the archive table.

bzimport added a comment.Via ConduitApr 5 2010, 1:36 AM

soxred93 wrote:

It may be quite possible to...

a) create a maintenance script that replaces every user_editcount field with the result of SELECT COUNT(*) AS count FROM revision WHERE rev_user_text = 'Example';
b) set the function in the User class which gets the edit count to just do that SQL query.

However, for users with a large number of edits, this is very slow. This may be out of our reach. Might this be possible?

MZMcBride added a comment.Via ConduitApr 5 2010, 1:42 AM

(In reply to comment #2)

a) create a maintenance script that replaces every user_editcount field with
the result of SELECT COUNT(*) AS count FROM revision WHERE rev_user_text =
'Example';

This is essentially what initEditCount.php does: http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/maintenance/initEditCount.php?view=markup

b) set the function in the User class which gets the edit count to just do that
SQL query.

Way too expensive. Even with the index on rev_user_text, you're talking about millions of rows with some of these users. The value must be stored so that it can be easily retrieved for things like creating the 'edit' links or not (autoconfirm checks this field). There might be other creative ways of updating it, though, like every time a user logs in.

liangent added a comment.Via ConduitJan 11 2013, 5:27 PM

RESOLVED INVALID?

[[mw:Manual:User table]]:

user_editcount

Count of edits and edit-like actions.
*NOT* intended to be an accurate copy of COUNT(*) WHERE rev_user=user_id. May contain NULL for old accounts if batch-update scripts haven't been run, as well as listing deleted edits and other myriad ways it could be out of sync. Execute the script initEditCount.php to update this table column.
Meant primarily for heuristic checks to give an impression of whether the account has been used much.
Bawolff added a comment.Via ConduitJan 11 2013, 6:01 PM

(In reply to comment #4)

I don't think this is invalid. Just because its not perfect now doesn't mean we can't do better.

But first of all perhaps we should add "approximently" to the edit counter on prefs

Add Comment

Column Prototype
This is a very early prototype of a persistent column. It is not expected to work yet, and leaving it open will activate other new features which will break things. Press "\" (backslash) on your keyboard to close it now.