Page MenuHomePhabricator

Inconsistent values of rev_user_text on pl.wikipedia due to incomplete renames (apparently)
Closed, ResolvedPublic

Description

Author: beau

Description:
There are some revisions with the same value of rev_user, but with different values of rev_user_text. Most of them are probably remains of incompleted renames, but there are a few cases with empty rev_user_text.

User id '24008'
Current name: 'Neutron'
rev_user_text: 'Neutron', 'Pkdragon'

User id '770'
Current name: 'Reytan'
rev_user_text: '', 'Reytan'

User id '32208'
Current name: 'AndrzejCC'
rev_user_text: 'AndrzejCC', 'Coca-Cola'

User id '22730'
Current name: 'Laforgue'
rev_user_text: 'Laforgue', 'Mch,ifuw'

User id '5466'
Current name: 'LukKot'
rev_user_text: 'Kotasik', 'LukKot'

User id '3228'
Current name: 'Stefan19'
rev_user_text: '', 'Stefan19'

User id '4630'
Current name: 'JaBoJa'
rev_user_text: 'JaBoJa', 'Tkt'

User id '18379'
Current name: 'Andrzej19'
rev_user_text: 'Andrzej18', 'Andrzej19'

User id '11812'
Current name: 'Sinus Pi'
rev_user_text: 'Sin(PI)', 'Sinus Pi'

User id '21248'
Current name: 'OFFset32'
rev_user_text: 'Kupseklej', 'OFFset32'

User id '3'
Current name: 'Matusz'
rev_user_text: '', 'Matusz'

User id '23461'
Current name: 'Remigiu'
rev_user_text: 'Michał P.', 'Remigiu'

User id '8496'
Current name: 'Bocianski'
rev_user_text: 'Amstel22', 'Bocianski'

User id '8918'
Current name: 'Sunridin'
rev_user_text: 'BrokenglaSS', 'Sunridin'

User id '6747'
Current name: 'Makarczuk'
rev_user_text: 'Makar', 'Makarczuk'

User id '4'
Current name: 'Kpjas'
rev_user_text: '', 'Kpjas'

User id '52501'
Current name: 'Lajsikonik'
rev_user_text: 'Lajsikonik', 'Skyforger'


Version: unspecified
Severity: normal

Details

Reference
bz22907

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 10:58 PM
bzimport set Reference to bz22907.
bzimport added a subscriber: Unknown Object (MLST).

I have poke at three of these (for which I could easily find log entries documenting the rename):

User id '52501'
Current name: 'Lajsikonik'
rev_user_text: 'Lajsikonik', 'Skyforger'

User id '5466'
Current name: 'LukKot'
rev_user_text: 'Kotasik', 'LukKot'

User id '8918'
Current name: 'Sunridin'
rev_user_text: 'BrokenglaSS', 'Sunridin'

Please check them over. I will need to look more closely at the other entries to see if there is anything that can be done.

They could be there due to restoring a deleted revision after the rename.

Looked at contribs and logs by hand. After that, I was willing to believe that these users really were the same and so I did the following re-attributions:

Pkdragon -> Neutron
Sin(PI) -> Sinus Pi
Amstel22 -> Bocianski
Makar -> Makarczuk
Andrzej18 -> Andrzej19
Michał P. -> Remigiu

This leaves revs for Reytan, Stefan19, Matusz, Kpjas (group 1) and
AndrzejCC, Laforgue, JaBoJa, OFFset32 (group 2).

A lot of the provided examples no longer appear to be broken, however:

MariaDB [plwiki_p]> select rev_id, rev_user, rev_user_text from revision, user where rev_user = user_id and rev_user_text != user_name;
+---------+----------+---------------+
| rev_id  | rev_user | rev_user_text |
+---------+----------+---------------+
|  209082 |        4 |               |
|  208926 |      770 |               |
|  208883 |     3228 |               |
|  208269 |        3 |               |
| 4338905 |    32208 | Coca-Cola     |
| 1800401 |    21248 | Kupseklej     |
| 1800432 |    21248 | Kupseklej     |
| 1804731 |    21248 | Kupseklej     |
| 1804735 |    21248 | Kupseklej     |
| 1937652 |    22730 | Mch,ifuw      |
| 2003509 |     4630 | Tkt           |
| 1240401 |     6468 | Cubus1984     |
| 3179052 |     4630 | Tkt           |
| 3179056 |     4630 | Tkt           |
| 1947117 |    22730 | Mch,ifuw      |
| 1240383 |     6468 | Cubus1984     |
| 1826373 |    21864 | Misiekf       |
| 1882346 |    22495 | Garro         |
| 6787261 |    76565 | Jasper322     |
| 6787327 |    76565 | Jasper322     |
| 6574957 |    76565 | Jasper322     |
| 6860544 |    76565 | Jasper322     |
+---------+----------+---------------+
22 rows in set (6 min 9.80 sec)

MariaDB [plwiki_p]> select rev_timestamp from revision where rev_id in (209082, 6860544);
+----------------+
| rev_timestamp  |
+----------------+
| 20040530184413 |
| 20070307155933 |
+----------------+
2 rows in set (0.00 sec)
Dereckson subscribed.

Would it be valuable to provide a fixT24907.php script to normalize the rev_user_text field for these users?

I re-ran the query from T24907#1462955 and those erroneous usernames are still in the database (as well as many more recent ones), but presumably that doesn't matter any more since the rev_user_text field is about to be dropped (T215466), and data in the actor table is consistent so far – this gives 0 results:

select rev_id, user_name, rev_timestamp
from revision, revision_actor_temp, actor, user 
where rev_id=revactor_rev
and revactor_actor=actor_id
and actor_user=user_id
and user_name!=actor_name;

I re-ran the query from T24907#1462955 and those erroneous usernames are still in the database (as well as many more recent ones), but presumably that doesn't matter any more since the rev_user_text field is about to be dropped (T215466), and data in the actor table is consistent so far – this gives 0 results:

select rev_id, user_name, rev_timestamp
from revision, revision_actor_temp, actor, user 
where rev_id=revactor_rev
and revactor_actor=actor_id
and actor_user=user_id
and user_name!=actor_name;

Agree that this may be moot, now that rev_actor is used to refer to an actor, instead of having rev_user and rev_user_text. Suggest closing an declined or invalid

@matmarex So, you think we can just decline this one?

Proposing to boldly close this task (see previous comments, plus T188327: Deploy refactored actor storage took place in the meantime). If this still happens in the current codebase of MediaWiki, then anyone is free to reopen (or file a fresh ticket).