Page MenuHomePhabricator

maintenance/cleanupUsersWithNoId.php doesn't do anything
Closed, DeclinedPublic

Description

php maintenance/cleanupUsersWithNoId.php --prefix wikipedia --force 
Beginning cleanup of revision
... rev_timestamp=20080902222628 rev_id=727520
... rev_timestamp=20090413231036 rev_id=1180889
... rev_timestamp=20090908043427 rev_id=1478509
... rev_timestamp=20100319145605 rev_id=1954897
... rev_timestamp=20100921200735 rev_id=2336222
... rev_timestamp=20111027111025 rev_id=3374285
... rev_timestamp=20120225201116 rev_id=3714190
... rev_timestamp=20190529151451 rev_id=8818524
Completed cleanup, assigned 0 and prefixed 0 row(s)

Yet when I run update.php it complains:

Beginning migration of revision.rev_user and revision.rev_user_text to revision_actor_temp.revactor_actor
User name "Andyisverydumb" is usable, cannot create an anonymous actor for it. Run maintenance/cleanupUsersWithNoId.php to fix this situation.

User name "Nemo quinquies" is usable, cannot create an anonymous actor for it. Run maintenance/cleanupUsersWithNoId.php to fix this situation.

User name "Majort" is usable, cannot create an anonymous actor for it. Run maintenance/cleanupUsersWithNoId.php to fix this situation.

... rev_id=8826924
Completed migration, updated 3 row(s) with 0 new actor(s), 3 error(s)

This is for translatewiki.net.

Event Timeline

Some more debugging data:

MariaDB [translatewiki_net]> select * from bw_revision where rev_user_text in ('Majort', 'Nemo quinquies')\G
*************************** 1. row ***************************
            rev_id: 5219910
          rev_page: 3635611
       rev_text_id: 5080522
       rev_comment: 
          rev_user: 14522
     rev_user_text: Majort
     rev_timestamp: 20131222182907
    rev_minor_edit: 0
       rev_deleted: 0
           rev_len: 60
     rev_parent_id: 0
          rev_sha1: 9j4xmottnehp5plaldqite0gy5t482z
rev_content_format: NULL
 rev_content_model: NULL
*************************** 2. row ***************************
            rev_id: 5212730
          rev_page: 3630136
       rev_text_id: 5073342
       rev_comment: 
          rev_user: 14506
     rev_user_text: Nemo quinquies
     rev_timestamp: 20131220173028
    rev_minor_edit: 0
       rev_deleted: 0
           rev_len: 23
     rev_parent_id: 0
          rev_sha1: 6pu3ft0x8vmh9b35vox8uyxwkxoxpqd
rev_content_format: NULL
 rev_content_model: NULL
2 rows in set (0.12 sec)
MariaDB [translatewiki_net]> select page_namespace, page_title from bw_page where page_id in (3635611, 3630136);
+----------------+----------------+
| page_namespace | page_title     |
+----------------+----------------+
|              2 | Nemo_quinquies |
|              2 | Majort         |
+----------------+----------------+
2 rows in set (0.00 sec)

E.g. https://translatewiki.net/w/i.php?title=User:Nemo_quinquies&action=history

MariaDB [translatewiki_net]> select user_name, user_id from bw_user where user_id in (14522, 14506);
Empty set (0.00 sec)

I am not sure why the users no longer exist, may be a bug in our custom signup code back then (2013) or basically anything.

cleanupUsersWithNoId.php doesn't handle rows with non-zero values for rev_user that don't correspond to a row in user.

The quickest fix would be to just UPDATE revision SET rev_user = 0 WHERE rev_user IN (14522, 14506); manually so cleanupUsersWithNoId.php would then find them.

More complicated would be to rework cleanupUsersWithNoId.php to check for use of non-zero IDs not existing in the user table. You'd likely have to search for the non-zero potential user IDs in batches. Maybe something vaguely like

$userStart = 0;
while ( true ) {
    $validUsers = $db->selectFieldValues( 'user', 'user_id', "user_id >= $userStart", __METHOD__, [ 'LIMIT' => $batchSize, 'ORDER BY' => 'user_id' ] );
    if ( $validUsers === [] ) {
        break;
    }
    $lastUser = end( $validUsers );
    $usersToCheck = array_diff( range( $userStart, $lastUser ), $validUsers );
    foreach ( $usersToCheck as $userId ) {
        // Existing code, except replacing `'xx_user' => 0` with `'xx_user' => $userId`
    }
    $userStart = $lastUser + 1;
}

That might add significantly to the runtime though, just for the fact of traversing all of user to find the potential missing IDs. Also missing in that code snippet is the possibility of having xx_user somehow very large, above all current user_id values.

For me, migrateActors.php was failing with a duplicate for 'Redirect fixer' https://phabricator.wikimedia.org/T229092. After I manually updated the revision table like Anomie suggested, I re-ran the cleanupUsersWithNoId.php -- which didn't make any changes or corrections. Now when I run migrateActors.php, it fails with

Function: MigrateActors::addActorsForRows
Error: 1062 Duplicate entry '' for key 'actor_name'

I can't update revision with rev_user = 0 because rev_user is already zero for the 24 records in question. What can I try next?

After I manually updated the revision table like Anomie suggested, I re-ran the cleanupUsersWithNoId.php -- which didn't make any changes or corrections.

Try to run it with the --force parameter, because it has been designed to run only once, and when it runs, it logs itself as being run and does nothing on next run unless using the --force parameter.

Example: cleanupUsersWithNoId.php --force

I'm trying to upgrade from 1.32 to 1.34 Actor Migration is still failing for me. Using --force with cleanupUsersWithNoId.php did not change the results of my upgrade attempt. Here is what update.php is reporting... lots of errors:

Completed actor creation, added 0 new actor(s)
Beginning migration of revision.rev_user and revision.rev_user_text to revision_actor_temp.revactor_actor
User name "Crowtherdr" is usable, cannot create an anonymous actor for it. Run maintenance/cleanupUsersWithNoId.php to fix this situation.

User name "SugdenHG" is usable, cannot create an anonymous actor for it. Run maintenance/cleanupUsersWithNoId.php to fix this situation.

User name "Gistauz" is usable, cannot create an anonymous actor for it. Run maintenance/cleanupUsersWithNoId.php to fix this situation.

User name "Balzac0001" is usable, cannot create an anonymous actor for it. Run maintenance/cleanupUsersWithNoId.php to fix this situation.

User name "BoisseauC" is usable, cannot create an anonymous actor for it. Run maintenance/cleanupUsersWithNoId.php to fix this situation.

User name "Poussardbe" is usable, cannot create an anonymous actor for it. Run maintenance/cleanupUsersWithNoId.php to fix this situation.

User name "Zazoult" is usable, cannot create an anonymous actor for it. Run maintenance/cleanupUsersWithNoId.php to fix this situation.

... rev_id=151
User name "Reig" is usable, cannot create an anonymous actor for it. Run maintenance/cleanupUsersWithNoId.php to fix this situation.

User name "Ccsmith" is usable, cannot create an anonymous actor for it. Run maintenance/cleanupUsersWithNoId.php to fix this situation.

User name "Joneswd" is usable, cannot create an anonymous actor for it. Run maintenance/cleanupUsersWithNoId.php to fix this situation.

User name "Didiermarcdarrort1" is usable, cannot create an anonymous actor for it. Run maintenance/cleanupUsersWithNoId.php to fix this situation.

... rev_id=276
Completed migration, updated 0 row(s) with 0 new actor(s), 194 error(s)
Beginning migration of archive.ar_user and archive.ar_user_text to archive.ar_actor
User name "Poussardbe" is usable, cannot create an anonymous actor for it. Run maintenance/cleanupUsersWithNoId.php to fix this situation.

User name "SugdenHG" is usable, cannot create an anonymous actor for it. Run maintenance/cleanupUsersWithNoId.php to fix this situation.

User name "Zazoult" is usable, cannot create an anonymous actor for it. Run maintenance/cleanupUsersWithNoId.php to fix this situation.

User name "Balzac0001" is usable, cannot create an anonymous actor for it. Run maintenance/cleanupUsersWithNoId.php to fix this situation.

User name "BoisseauC" is usable, cannot create an anonymous actor for it. Run maintenance/cleanupUsersWithNoId.php to fix this situation.

User name "Ccsmith" is usable, cannot create an anonymous actor for it. Run maintenance/cleanupUsersWithNoId.php to fix this situation.

User name "Ellisgj" is usable, cannot create an anonymous actor for it. Run maintenance/cleanupUsersWithNoId.php to fix this situation.

... ar_id=428
Completed migration, updated 0 row(s) with 0 new actor(s), 81 error(s)
Beginning migration of ipblocks.ipb_by and ipblocks.ipb_by_text to ipblocks.ipb_by_actor
Completed migration, updated 0 row(s) with 0 new actor(s), 0 error(s)
Beginning migration of image.img_user and image.img_user_text to image.img_actor
User name "Poussardbe" is usable, cannot create an anonymous actor for it. Run maintenance/cleanupUsersWithNoId.php to fix this situation.

... img_name=Template-info.svg.png
Completed migration, updated 0 row(s) with 0 new actor(s), 3 error(s)
Beginning migration of oldimage.oi_user and oldimage.oi_user_text to oldimage.oi_actor
Completed migration, updated 0 row(s) with 0 new actor(s), 0 error(s)
Beginning migration of filearchive.fa_user and filearchive.fa_user_text to filearchive.fa_actor
Completed migration, updated 0 row(s) with 0 new actor(s), 0 error(s)
Beginning migration of recentchanges.rc_user and recentchanges.rc_user_text to recentchanges.rc_actor
Completed migration, updated 0 row(s) with 0 new actor(s), 0 error(s)
Beginning migration of logging.log_user and logging.log_user_text to logging.log_actor
Wikimedia\Rdbms\DBQueryError from line 1603 of /opt/htdocs/mediawiki/includes/libs/rdbms/database/Database.php: A database query error has occurred. Did you forget to run your application's database schema updater after upgrading? 
Query: INSERT  INTO `actor` (actor_name) VALUES ('')
Function: MigrateActors::addActorsForRows
Error: 1062 Duplicate entry '' for key 'actor_name' (localhost)

I've tried running migrateActors.php directly, but it gives the same errors.

I really don't know what to do to troubleshoot this or fix it. Any suggestions?

It seems to be choking on the logging table. When I look at those records, there are many with empty log_user_text
mysql wiki_fr -e 'select distinct log_user from logging where log_user_text = "" order by log_user;'

select count(*) from logging where log_user_text="";
+----------+
| count(*) |
+----------+
|      879 |
+----------+
1 row in set (0.001 sec)

work around:

delete from logging where log_user_text="";

There were a few wikis where, on upgrade from MW 1.31 to MW 1.35, the maintenance/upgrade.php script failed with a mess of errors like

User name "RandmGrbage6661313" is usable, cannot create an anonymous actor for it. Run maintenance/cleanupUsersWithNoId.php to fix this situation.

In most cases, these users turned out to be spambots, "Maintenance script" or other entries where there's a name but a user_id of 0 or no corresponding user in the user table. This tends to happen if an incomplete attempt is made to get rid of a spam "user" (such as by running maintenance/removeUnusedAccounts.php) with remnants left behind in the archive or log table; it also happens if a wiki contains imported revisions and the corresponding users don't exist locally, as well as for pseudo usernames for things (such as maintenance scripts) which aren't users and therefore lack a non-zero user_id or a user table entry.

The suggestion to run maintenance/cleanupUsersWithNoId.php seems to do nothing, or at least nothing useful. Run it, then try to do maintenance/migrateActors.php (or update.php - which calls migrateActors) and the process still fails exactly in the same manner - making an update.php run impossible because this fatal error shuts down the entire process.

php maintenance/populateContentTables.php --wiki xx
php maintenance/findMissingActors.php --wiki xx --field rc_actor --overwrite-with "Unknown user"
php maintenance/findMissingActors.php --wiki xx --field revactor_actor --overwrite-with "Unknown user"
php maintenance/findMissingActors.php --wiki xx --field ar_actor --overwrite-with "Unknown user"
php maintenance/findMissingActors.php --wiki xx --field img_actor --overwrite-with "Unknown user"
php maintenance/findMissingActors.php --wiki xx --field oi_actor --overwrite-with "Unknown user"
php maintenance/findMissingActors.php --wiki xx --field fa_actor --overwrite-with "Unknown user"
php maintenance/findMissingActors.php --wiki xx --field log_actor --overwrite-with "Unknown user"
php maintenance/cleanupUsersWithNoId.php --wiki xx --prefix whatever --force
php maintenance/migrateActors.php --wiki xx
php maintenance/update.php --wiki xx

might work, by throwing everything but the kitchen sink at this, but there's nothing in the messages displayed to the user to indicate that every possible findMissingActors.php variant might need to be thrown at an existing installation (possibly multiple times, as --field log_actor will stop after the first thousand entries) just to get migrateActors.php or update.php to complete without a mess of "User name "(whatever)" is usable, cannot create an anonymous actor for it. Run maintenance/cleanupUsersWithNoId.php to fix this situation." messages which bring the entire upgrade process to a screeching halt.

In any case, I'm not entirely comfortable with forcing --overwrite-with "Unknown user" as it risks breaking attribution or creating new problems. So far, all of the problem entries look to be spambots or maintenance scripts (which would make sense, as they're the ones most likely to be missing a users table entry or be showing a name with user_id==0) but it'd be better just to auto-create the user (even if it's a dummy user table entry for some random bot or script) rather than risk doing something which could break attribution for imported wikitext which belongs to a real, human user somewhere.

There is a need for maintenance/update.php (or the scripts which it currently invokes) to handle this seamlessly, automatically and without destroying existing information. Clearly we're not there yet. If this issue stops update.php in its tracks, with the "explanations" given to the user on how to recover from this condition being meaningless or worthless, this is an obstacle to upgrading MW version on existing wikis. It won't be triggered on every wiki (it's just a minority which have log file, archive. revision et al. for non-user scripts and bots) but it does happen often enough that the update scripts need to handle it and recover gracefully.

I have the same issue, can it be solved manually by running a script that would delete the users and replace the mussing fields by something else ??

There were a few wikis where, on upgrade from MW 1.31 to MW 1.35, the maintenance/upgrade.php script failed with a mess of errors like

User name "RandmGrbage6661313" is usable, cannot create an anonymous actor for it. Run maintenance/cleanupUsersWithNoId.php to fix this situation.

In most cases, these users turned out to be spambots, "Maintenance script" or other entries where there's a name but a user_id of 0 or no corresponding user in the user table. This tends to happen if an incomplete attempt is made to get rid of a spam "user" (such as by running maintenance/removeUnusedAccounts.php) with remnants left behind in the archive or log table; it also happens if a wiki contains imported revisions and the corresponding users don't exist locally, as well as for pseudo usernames for things (such as maintenance scripts) which aren't users and therefore lack a non-zero user_id or a user table entry.

The suggestion to run maintenance/cleanupUsersWithNoId.php seems to do nothing, or at least nothing useful. Run it, then try to do maintenance/migrateActors.php (or update.php - which calls migrateActors) and the process still fails exactly in the same manner - making an update.php run impossible because this fatal error shuts down the entire process.

php maintenance/populateContentTables.php --wiki xx
php maintenance/findMissingActors.php --wiki xx --field rc_actor --overwrite-with "Unknown user"
php maintenance/findMissingActors.php --wiki xx --field revactor_actor --overwrite-with "Unknown user"
php maintenance/findMissingActors.php --wiki xx --field ar_actor --overwrite-with "Unknown user"
php maintenance/findMissingActors.php --wiki xx --field img_actor --overwrite-with "Unknown user"
php maintenance/findMissingActors.php --wiki xx --field oi_actor --overwrite-with "Unknown user"
php maintenance/findMissingActors.php --wiki xx --field fa_actor --overwrite-with "Unknown user"
php maintenance/findMissingActors.php --wiki xx --field log_actor --overwrite-with "Unknown user"
php maintenance/cleanupUsersWithNoId.php --wiki xx --prefix whatever --force
php maintenance/migrateActors.php --wiki xx
php maintenance/update.php --wiki xx

might work, by throwing everything but the kitchen sink at this, but there's nothing in the messages displayed to the user to indicate that every possible findMissingActors.php variant might need to be thrown at an existing installation (possibly multiple times, as --field log_actor will stop after the first thousand entries) just to get migrateActors.php or update.php to complete without a mess of "User name "(whatever)" is usable, cannot create an anonymous actor for it. Run maintenance/cleanupUsersWithNoId.php to fix this situation." messages which bring the entire upgrade process to a screeching halt.

In any case, I'm not entirely comfortable with forcing --overwrite-with "Unknown user" as it risks breaking attribution or creating new problems. So far, all of the problem entries look to be spambots or maintenance scripts (which would make sense, as they're the ones most likely to be missing a users table entry or be showing a name with user_id==0) but it'd be better just to auto-create the user (even if it's a dummy user table entry for some random bot or script) rather than risk doing something which could break attribution for imported wikitext which belongs to a real, human user somewhere.

There is a need for maintenance/update.php (or the scripts which it currently invokes) to handle this seamlessly, automatically and without destroying existing information. Clearly we're not there yet. If this issue stops update.php in its tracks, with the "explanations" given to the user on how to recover from this condition being meaningless or worthless, this is an obstacle to upgrading MW version on existing wikis. It won't be triggered on every wiki (it's just a minority which have log file, archive. revision et al. for non-user scripts and bots) but it does happen often enough that the update scripts need to handle it and recover gracefully.

Thank you for your command lines, they helped me resurrect an old mediawiki installation that had been almost abandoned to SPAMbots.

I ended up solving it by executing those commands and emptying the "actor" table before trying to update it again :)

In any case, I'm not entirely comfortable with forcing --overwrite-with "Unknown user" as it risks breaking attribution or creating new problems. So far, all of the problem entries look to be spambots or maintenance scripts (which would make sense, as they're the ones most likely to be missing a users table entry or be showing a name with user_id==0) but it'd be better just to auto-create the user (even if it's a dummy user table entry for some random bot or script) rather than risk doing something which could break attribution for imported wikitext which belongs to a real, human user somewhere.

The ideal solution here would be to find all rows that have a "user_id" of 0 or one not existing in the user table, that are not an IP address (either IPv4 or IPv6), and prepend "unknown>" to it. This will flag all those users as "external" or "imported", and the migration script will create new actor rows for them.

I ended up solving it by executing those commands and emptying the "actor" table before trying to update it again :)

This will corrupt your database. All revisions that were assigned to one of the actor that was removed after you emptied the actor table will be misattributed to a different user now. If the actor_id they were assigned to is not even inserted on new runs of the maintenance scripts, those revisions will be either missing from the page history, or will cause errors. Pages where the latest edit was from one of those actors will fail to load at all.

matmarex assigned this task to tstarling.
matmarex subscribed.

Script has been deleted in https://gerrit.wikimedia.org/r/c/mediawiki/core/+/971331 before we got around to fixing this problem with it.