Page MenuHomePhabricator

Early edits on en.wiki with rev_user = 0 and truncated rev_user_text
Open, LowPublic

Description

Hi again Lego - found another one.

https://en.wikipedia.org/w/index.php?title=Louisville,_Kentucky&offset=20020723160803&limit=1&action=history

https://en.wikipedia.org/wiki/Special:Contributions/10.26

If there's a particular project I should file stuff like this under, please let me know.

Event Timeline

Scott created this task.Jul 25 2015, 12:26 PM
Scott assigned this task to Legoktm.
Scott raised the priority of this task from to Needs Triage.
Scott updated the task description. (Show Details)
Scott added a subscriber: Scott.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 25 2015, 12:26 PM
Restricted Application added a subscriber: Steinsplitter. · View Herald TranscriptJul 25 2015, 12:58 PM
mysql:wikiadmin@db1052 [enwiki]> select user_id, user_registration from user where user_name="10.26";
+----------+-------------------+
| user_id  | user_registration |
+----------+-------------------+
| 11455788 | 20100117012555    |
+----------+-------------------+
1 row in set (0.00 sec)
mysql:wikiadmin@db1052 [enwiki]> select rev_user, rev_user_text, rev_id, rev_timestamp from revision where rev_user_text="10.26";
+----------+---------------+--------+----------------+
| rev_user | rev_user_text | rev_id | rev_timestamp  |
+----------+---------------+--------+----------------+
|        0 | 10.26         |  59652 | 20020321073731 |
|        0 | 10.26         |  58376 | 20020321094317 |
|        0 | 10.26         | 121369 | 20020330065544 |
|        0 | 10.26         | 121315 | 20020406065443 |
|        0 | 10.26         |  56148 | 20020406065607 |
|        0 | 10.26         | 121803 | 20020406070117 |
|        0 | 10.26         |  59850 | 20020419082628 |
|        0 | 10.26         |  59775 | 20020419090037 |
|        0 | 10.26         |  59792 | 20020521073256 |
|        0 | 10.26         |  56504 | 20020524095538 |
|        0 | 10.26         | 121387 | 20020603120535 |
|        0 | 10.26         |  59925 | 20020603121823 |
|        0 | 10.26         | 121689 | 20020603122758 |
|        0 | 10.26         | 121463 | 20020608081902 |
|        0 | 10.26         | 121401 | 20020608082533 |
|        0 | 10.26         | 121787 | 20020608082632 |
|        0 | 10.26         |  59251 | 20020608085154 |
|        0 | 10.26         |  56591 | 20020615080931 |
|        0 | 10.26         |  60558 | 20020621082140 |
|        0 | 10.26         |  58102 | 20020621082924 |
|        0 | 10.26         | 121524 | 20020622095249 |
|        0 | 10.26         | 122367 | 20020628083113 |
|        0 | 10.26         |  56863 | 20020708121822 |
|        0 | 10.26         |  57415 | 20020713082336 |
|        0 | 10.26         |  57201 | 20020717071415 |
|        0 | 10.26         |  56146 | 20020720073542 |
|        0 | 10.26         |  56475 | 20020720081055 |
|        0 | 10.26         | 122437 | 20020720081539 |
|        0 | 10.26         |  60112 | 20020723071304 |
|        0 | 10.26         |  59782 | 20020723071604 |
|        0 | 10.26         |  59823 | 20020723073321 |
|        0 | 10.26         |  59820 | 20020723073656 |
|        0 | 10.26         |  59871 | 20020723075316 |
|        0 | 10.26         |  60091 | 20020723081144 |
+----------+---------------+--------+----------------+
34 rows in set (0.00 sec)

Are we sure the person who owns the "10.26" account is the same person who made the edits?

If there's a particular project I should file stuff like this under, please let me know.

Wikimedia-General-or-Unknown if it's not SULF specific.

Legoktm set Security to None.
Aklapper triaged this task as Low priority.Jul 31 2015, 8:36 AM
Scott added a comment.Jul 31 2015, 8:56 AM

Are we sure the person who owns the "10.26" account is the same person who made the edits?

From experience of other database messes of that vintage, they're probably not. I'm just heading out for a weekend vacation, so will examine them individually when I get back.

Scott added a comment.EditedAug 23 2015, 2:19 AM

I finally got around to investigating this. "10.26" is mangled data, caused by the early MediaWiki bug that @Graham87 describes here, rather than an actual user (although an account by that name was created in 2010 for some reason). These edits should all actually be attached to User:64.26.98.90, which is the IP address that BRG edited from before registering an account.

Scott added a subscriber: Graham87.Aug 23 2015, 2:23 AM

Scott, thanks for adding me in. Yes, I created the 10.26 account to avoid impersonation, before T36873 was a thing.

Scott renamed this task from Early edit(s?) on enwp detached from user to Early edits on enwp attached to incorrect user.Aug 23 2015, 2:34 AM
Scott added a comment.Aug 23 2015, 2:39 AM

Right, that makes sense.

You've encountered BRG as 64.26.98.90 before, by the way; he was the one who moved content from "Talk radio/Louisville" to "Louisville, Kentucky". I happened across these stray edits by reading your list of database anomalies.

Restricted Application added a subscriber: Matanya. · View Herald TranscriptSep 12 2015, 10:39 AM
Nemo_bis renamed this task from Early edits on enwp attached to incorrect user to Early edits on en.wiki with rev_user = 0 and incorrect rev_user_text.Sep 12 2015, 10:44 AM

I changed the summary because it was not clear what "attached" means. Those revisions are not attributed to any user account, as rev_user = 0.

Nemo_bis renamed this task from Early edits on en.wiki with rev_user = 0 and incorrect rev_user_text to Early edits on en.wiki with rev_user = 0 and truncated rev_user_text.Sep 12 2015, 10:48 AM
Scott added a comment.Sep 12 2015, 1:26 PM

I changed the summary because it was not clear what "attached" means. Those revisions are not attributed to any user account, as rev_user = 0.

Well, my bug reports are made from the perspective of an end user. An entry in action=history displays a user name and links to Special:Contributions for it, implying that the edit shown is attached to (belongs to, was created by) it.

Reading Lego's query output again and your comment, I understand more now - I wasn't aware of the table structure and what I was looking at. I've now found it, and I'll be honest with you, I'm astonished that it's insufficiently normalized. But that's well out of the scope of this bug....

Regarding the truncated rev_user_text, if I understand @Graham87's report of the July 2002 import bug correctly, then that information is not recoverable. I'd love to hear differently, of course.

Actually, since you've made those part of the topic of this bug (good idea), could you run a query to identify how many of them there are? I guess it would be something like select rev_user, rev_user_text, rev_id, rev_timestamp from revision where rev_user_text regexp '^\d+\.\d+$' and rev_timestamp < 20020727000000;

At some point I must arrange myself a clone of the database to be able to do my own queries.

Actually, since you've made those part of the topic of this bug (good idea), could you run a query to identify how many of them there are?

Were they all truncated to 2+2 digits? Hostnames were also valid back in the days, some of them are quite short.

At some point I must arrange myself a clone of the database to be able to do my own queries.

Yes! https://wikitech.wikimedia.org/wiki/Help:Tool_Labs#Quick_start is rather easy.

Scott added a subscriber: hoo.Sep 12 2015, 3:12 PM

Actually, since you've made those part of the topic of this bug (good idea), could you run a query to identify how many of them there are?

Were they all truncated to 2+2 digits? Hostnames were also valid back in the days, some of them are quite short.

First two IP quads, which is why my query suggestion includes that regular expression.

This comment by @hoo suggests that a search for "bomis" will net you some host names, e.g. cobrand.bomis.com. Some old related discussion is at T29992. I think @Graham87 is your man for asking about the extent of this stuff.

At some point I must arrange myself a clone of the database to be able to do my own queries.

Yes! https://wikitech.wikimedia.org/wiki/Help:Tool_Labs#Quick_start is rather easy.

Thanks! I'll be sure to look into it when I have sufficient mental time.

Yes, the truncated IP address contained two octets.

The hostnames were from 2001, not 2002, and always started with a lower-case letter. There's a list of edits on the English Wikipedia containing such usernames at:
https://en.wikipedia.org/wiki/User:Nemo_bis/Bug_323_revisions/Anonymous
And there's a list of those that have been deleted at:
https://en.wikipedia.org/wiki/User:Nemo_bis/Bug_323_revisions/Deleted#Anonymous

Scott added a comment.Sep 12 2015, 4:50 PM

Thanks Graham. Okay, so from the name of that list (@Nemo_bis's own, I see) it's bug T2323 for those.

@Legoktm What's the next action needed for this?