Page MenuHomePhabricator

Investigate migrateActors script failing due to duplicate empty actor_name
Closed, ResolvedPublic

Description

In support desk there have been several reports of MediaWiki upgrades failing on MediaWiki 1.33 when running the migrateActors script. The error happens because there's apparently a duplicate actor_name with empty value:

A database query error has occurred. Did you forget to run your application's database schema updater after upgrading?

Query: INSERT  INTO `actor` (actor_name) VALUES ('')
Function: MigrateActors::addActorsForRows
Error: 1062 Duplicate entry '' for key 'actor_name'

Reports in Support Desk:

Event Timeline

Anomie subscribed.

I wonder whether this is a combination of rows with a bogus user ID (like T224854) and no xx_user_text (like T217972). In that situation MigrateActors would try to look up the actor ID by the bogus user ID and so wouldn't find the existing row with empty xx_user_text.

I note that the "solution" at https://www.mediawiki.org/w/index.php?title=Topic:V467iyg07vqsrdwy&topic_showPostId=v46a67yck18a06cn#flow-post-v46a67yck18a06cn most likely just introduced new database corruption, as that is likely to lead to rows referring to actor IDs that don't exist.

Potential real fixes would be to have maintenance/cleanupUsersWithNoId.php handle bogus non-zero user IDs (as in T224854#5234443) and have MigrateActors::addActorsForRows() refuse to process rows with non-zero user ID like it already does for rows with usable names (near line 201).

I just did the 1.32.x -> 1.34 migration and hit this. My wiki has been around for a decade but hasn't been hacked up too much. To get the update.php script to work I followed the "solution" in the support ticket linked above (delete the empty string actor_name row).

After seeing Anomie's comment I did a SELECT COUNT(*) FROM revision/archive/ipblocks/image/oldimage/filearchive/recentchanges/logging WHERE foo_actor = xxx where xxx was the actor row I manually deleted. I *did* see bad rows in the logging table so I did an UPDATE to move them to the new empty string actor_id. Is it worth updating the thread to tell people not to do the manual delete? I believe it is a top Google result for the problem.

If it is of any help to you all with troubleshooting I can send you a reasonably sized database dump of my wiki for experimentation.

I'm running into this problem upgrading from REL1_32 to REL1_34. I'd be happy to provide any additional details.

Additional Details: Attempting to run php /opt/htdocs/mediawiki/maintenance/migrateActors.php on a large 60GB wiki database.

I get many "User name X is usable, cannot create an anonymous actor for it. Run maintenance/cleanupUsersWithNoId.php" but running cleanupUsersWithNoId.php doesn't do anything.

MigrateActors.php fails with Duplicate entry 'Redirect fixer' for key 'actor_name'

User name "Gscrowder" is usable, cannot create an anonymous actor for it. Run maintenance/cleanupUsersWithNoId.php to fix this situation.

User name "Bunkage" is usable, cannot create an anonymous actor for it. Run maintenance/cleanupUsersWithNoId.php to fix this situation.

User name "Kathyfield" is usable, cannot create an anonymous actor for it. Run maintenance/cleanupUsersWithNoId.php to fix this situation.

... rev_id=153853
Wikimedia\Rdbms\DBQueryError from line 1603 of /opt/htdocs/mediawiki/includes/libs/rdbms/database/Database.php: A database query error has occurred. Did you forget to run your application's database schema updater after upgrading? 
Query: INSERT  INTO `actor` (actor_name) VALUES ('Redirect fixer')
Function: MigrateActors::addActorsForRows
Error: 1062 Duplicate entry 'Redirect fixer' for key 'actor_name' (localhost)

#0 /opt/htdocs/mediawiki/includes/libs/rdbms/database/Database.php(1574): Wikimedia\Rdbms\Database->getQueryExceptionAndLog('Duplicate entry...', 1062, 'INSERT  INTO `a...', 'MigrateActors::...')
#1 /opt/htdocs/mediawiki/includes/libs/rdbms/database/Database.php(1152): Wikimedia\Rdbms\Database->reportQueryError('Duplicate entry...', 1062, 'INSERT  INTO `a...', 'MigrateActors::...', false)
#2 /opt/htdocs/mediawiki/includes/libs/rdbms/database/Database.php(2140): Wikimedia\Rdbms\Database->query('INSERT  INTO `a...', 'MigrateActors::...')
#3 /opt/htdocs/mediawiki/includes/libs/rdbms/database/DBConnRef.php(68): Wikimedia\Rdbms\Database->insert('`actor`', Array, 'MigrateActors::...')
#4 /opt/htdocs/mediawiki/includes/libs/rdbms/database/DBConnRef.php(371): Wikimedia\Rdbms\DBConnRef->__call('insert', Array)
#5 /opt/htdocs/mediawiki/maintenance/includes/MigrateActors.php(217): Wikimedia\Rdbms\DBConnRef->insert('actor', Array, 'MigrateActors::...')
#6 /opt/htdocs/mediawiki/maintenance/includes/MigrateActors.php(408): MigrateActors->addActorsForRows(Object(Wikimedia\Rdbms\MaintainableDBConnRef), 'rev_user_text', Array, Array, 75083)
#7 /opt/htdocs/mediawiki/maintenance/includes/MigrateActors.php(103): MigrateActors->migrateToTemp('revision', 'rev_id', Array, 'rev_user', 'rev_user_text', 'revactor_rev', 'revactor_actor')
#8 /opt/htdocs/mediawiki/maintenance/Maintenance.php(1746): MigrateActors->doDBUpdates()
#9 /opt/htdocs/mediawiki/maintenance/doMaintenance.php(99): LoggedUpdateMaintenance->execute()
#10 /opt/htdocs/mediawiki/maintenance/migrateActors.php(27): require_once('/opt/htdocs/med...')
#11 {main}

In revision, there are 3.7 million rows; 56 of which have the rev_user_text 'Redirect fixer'

select count(*) from revision where rev_user_text = 'Redirect fixer';
 select * from actor where actor_name='';
+----------+------------+------------+
| actor_id | actor_user | actor_name |
+----------+------------+------------+
|     3164 |       NULL |            |
+----------+------------+------------+
1 row in set (0.000 sec)

Then

delete from actor where actor_id=3164

to work around

and the migrate script still fails ..

Latest example:
Setup fresh 'demo' wiki

Product Version
MediaWiki 1.34.3 (266ce19)
PHP 7.2.33-1+0~20200807.47+debian10~1.gbpcb3068 (apache2handler)
MariaDB 10.3.23-MariaDB-0+deb10u1-log
ICU 65.1
LuaSandbox 3.0.3
Lua 5.1.5
Elasticsearch 6.8.12

Overwrite the demo_wiki database with a dump from 1.32

Product Version
MediaWiki 1.32.6 (5807cdf)
07:31, 19 December 2019
PHP 7.1.33-17+0~20200807.39+debian9~1.gbp032d47 (apache2handler)
MariaDB 10.1.45-MariaDB-0+deb9u1
ICU 65.1
Elasticsearch 5.6.16
Lua 5.1.5

Run the updater:
WIKI=demo php /opt/htdocs/mediawiki/maintenance/update.php --quick

Relevant details:

Completed actor creation, added 37608 new actor(s)
Beginning migration of revision.rev_user and revision.rev_user_text to revision_actor_temp.revactor_actor
User name "Crowtherdr" is usable, cannot create an anonymous actor for it. Run maintenance/cleanupUsersWithNoId.php to fix this situation.

User name "SugdenHG" is usable, cannot create an anonymous actor for it. Run maintenance/cleanupUsersWithNoId.php to fix this situation.

User name "Gistauz" is usable, cannot create an anonymous actor for it. Run maintenance/cleanupUsersWithNoId.php to fix this situation.

User name "Balzac0001" is usable, cannot create an anonymous actor for it. Run maintenance/cleanupUsersWithNoId.php to fix this situation.

User name "BoisseauC" is usable, cannot create an anonymous actor for it. Run maintenance/cleanupUsersWithNoId.php to fix this situation.

User name "Poussardbe" is usable, cannot create an anonymous actor for it. Run maintenance/cleanupUsersWithNoId.php to fix this situation.

User name "Zazoult" is usable, cannot create an anonymous actor for it. Run maintenance/cleanupUsersWithNoId.php to fix this situation.

... rev_id=149
User name "Reig" is usable, cannot create an anonymous actor for it. Run maintenance/cleanupUsersWithNoId.php to fix this situation.

User name "Ccsmith" is usable, cannot create an anonymous actor for it. Run maintenance/cleanupUsersWithNoId.php to fix this situation.

User name "Joneswd" is usable, cannot create an anonymous actor for it. Run maintenance/cleanupUsersWithNoId.php to fix this situation.

User name "Didiermarcdarrort1" is usable, cannot create an anonymous actor for it. Run maintenance/cleanupUsersWithNoId.php to fix this situation.

snip

Completed migration, updated 26038 row(s) with 2 new actor(s), 194 error(s)
Beginning migration of archive.ar_user and archive.ar_user_text to archive.ar_actor
User name "Poussardbe" is usable, cannot create an anonymous actor for it. Run maintenance/cleanupUsersWithNoId.php to fix this situation.

User name "SugdenHG" is usable, cannot create an anonymous actor for it. Run maintenance/cleanupUsersWithNoId.php to fix this situation.

User name "Zazoult" is usable, cannot create an anonymous actor for it. Run maintenance/cleanupUsersWithNoId.php to fix this situation.

User name "Balzac0001" is usable, cannot create an anonymous actor for it. Run maintenance/cleanupUsersWithNoId.php to fix this situation.

... ar_id=100
User name "BoisseauC" is usable, cannot create an anonymous actor for it. Run maintenance/cleanupUsersWithNoId.php to fix this situation.

User name "Ccsmith" is usable, cannot create an anonymous actor for it. Run maintenance/cleanupUsersWithNoId.php to fix this situation.

... ar_id=200
... ar_id=300
... ar_id=400
User name "Ellisgj" is usable, cannot create an anonymous actor for it. Run maintenance/cleanupUsersWithNoId.php to fix this situation.

snip

Completed migration, updated 1003 row(s) with 0 new actor(s), 81 error(s)
Beginning migration of ipblocks.ipb_by and ipblocks.ipb_by_text to ipblocks.ipb_by_actor
Completed migration, updated 0 row(s) with 0 new actor(s), 0 error(s)
Beginning migration of image.img_user and image.img_user_text to image.img_actor
... img_name=CHF_Vitrolles_France.JPG
... img_name=Carte_position_Landes.png
... img_name=Copyright.png
... img_name=FCKsymbol.png
User name "Poussardbe" is usable, cannot create an anonymous actor for it. Run maintenance/cleanupUsersWithNoId.php to fix this situation.

... img_name=Image_insertion_curseur.jpg
... img_name=Propriétés_du_lien.jpg
... img_name=Wiki_Buffalo&Erie_County_Historical_Society.jpg
... img_name=Yaounde_District_FHC.JPG
Completed migration, updated 707 row(s) with 0 new actor(s), 3 error(s)
Beginning migration of oldimage.oi_user and oldimage.oi_user_text to oldimage.oi_actor
... oi_name=Luxembourg-communes-map.png oi_timestamp=20130319205727
Completed migration, updated 20 row(s) with 0 new actor(s), 0 error(s)
Beginning migration of filearchive.fa_user and filearchive.fa_user_text to filearchive.fa_actor
... fa_id=43
Completed migration, updated 42 row(s) with 0 new actor(s), 0 error(s)
Beginning migration of recentchanges.rc_user and recentchanges.rc_user_text to recentchanges.rc_actor
... rc_id=26777
Completed migration, updated 87 row(s) with 0 new actor(s), 0 error(s)
Beginning migration of logging.log_user and logging.log_user_text to logging.log_actor
... log_id=100
Wikimedia\Rdbms\DBQueryError from line 1603 of /opt/htdocs/mediawiki/includes/libs/rdbms/database/Database.php: A database query error has occurred. Did you forget to run your application's database schema updater after upgrading? 
Query: INSERT  INTO `actor` (actor_name) VALUES ('')
Function: MigrateActors::addActorsForRows
Error: 1062 Duplicate entry '' for key 'actor_name' (localhost)

#0 /opt/htdocs/mediawiki/includes/libs/rdbms/database/Database.php(1574): Wikimedia\Rdbms\Database->getQueryExceptionAndLog('Duplicate entry...', 1062, 'INSERT  INTO `a...', 'MigrateActors::...')
#1 /opt/htdocs/mediawiki/includes/libs/rdbms/database/Database.php(1152): Wikimedia\Rdbms\Database->reportQueryError('Duplicate entry...', 1062, 'INSERT  INTO `a...', 'MigrateActors::...', false)
#2 /opt/htdocs/mediawiki/includes/libs/rdbms/database/Database.php(2140): Wikimedia\Rdbms\Database->query('INSERT  INTO `a...', 'MigrateActors::...')
#3 /opt/htdocs/mediawiki/includes/libs/rdbms/database/DBConnRef.php(68): Wikimedia\Rdbms\Database->insert('`actor`', Array, 'MigrateActors::...')
#4 /opt/htdocs/mediawiki/includes/libs/rdbms/database/DBConnRef.php(371): Wikimedia\Rdbms\DBConnRef->__call('insert', Array)
#5 /opt/htdocs/mediawiki/maintenance/includes/MigrateActors.php(217): Wikimedia\Rdbms\DBConnRef->insert('actor', Array, 'MigrateActors::...')
#6 /opt/htdocs/mediawiki/maintenance/includes/MigrateActors.php(302): MigrateActors->addActorsForRows(Object(Wikimedia\Rdbms\MaintainableDBConnRef), 'log_user_text', Array, Array, 0)
#7 /opt/htdocs/mediawiki/maintenance/includes/MigrateActors.php(113): MigrateActors->migrate('logging', Array, 'log_user', 'log_user_text', 'log_actor')
#8 /opt/htdocs/mediawiki/maintenance/Maintenance.php(1746): MigrateActors->doDBUpdates()
#9 /opt/htdocs/mediawiki/includes/installer/DatabaseUpdater.php(1318): LoggedUpdateMaintenance->execute()
#10 /opt/htdocs/mediawiki/includes/installer/DatabaseUpdater.php(490): DatabaseUpdater->migrateActors()
#11 /opt/htdocs/mediawiki/includes/installer/DatabaseUpdater.php(454): DatabaseUpdater->runUpdates(Array, false)
#12 /opt/htdocs/mediawiki/maintenance/update.php(205): DatabaseUpdater->doUpdates(Array)
#13 /opt/htdocs/mediawiki/maintenance/doMaintenance.php(99): UpdateMediaWiki->execute()
#14 /opt/htdocs/mediawiki/maintenance/update.php(277): require_once('/opt/htdocs/med...')
#15 {main}

As you can see there are many errors and finally a fatal error when trying to create a duplicate entry in 'actor' for 'actor_name'=''

WIKI=demo php /opt/htdocs/mediawiki/maintenance/cleanupUsersWithNoId.php --prefix fr had no affect (assigned 0 and prefixed 0 row(s)) as reported at T224854

mysql -e 'select count(*) from wiki_demo.archive WHERE ar_user_text="";'
+----------+
| count(*) |
+----------+
|        0 |
+----------+
mysql -e 'select count(*) from wiki_demo.revision WHERE rev_user_text="";'
+----------+
| count(*) |
+----------+
|        0 |
+----------+
mysql -e 'select count(*) from wiki_demo.image WHERE img_user_text="";'
+----------+
| count(*) |
+----------+
|        0 |
+----------+
mysql -e 'select count(*) from wiki_demo.logging WHERE log_user_text = "";'
+----------+
| count(*) |
+----------+
|      332 |
+----------+
select count(*) count, log_user, max(log_timestamp) latest, log_user_text from logging where log_user_text = "" group by log_user having count > 2;
+-------+----------+----------------+---------------+
| count | log_user | latest         | log_user_text |
+-------+----------+----------------+---------------+
|     5 |        1 | 20110616173211 |               |
|    96 |       54 | 20110630073430 |               |
|    13 |       85 | 20111018215224 |               |
|     3 |       86 | 20101208132722 |               |
+-------+----------+----------------+---------------+

So there are many entries in the logging table where log_user_text is blank. Most users only have a single entry however there are a couple that break into double digits. All the entries are quite old.

Should I attempt to fix this? How? I'm hesitant to just delete the 'bad data' (= data loss).

see https://phabricator.wikimedia.org/T224854
and work around

select count(*) from logging where log_user_text="";
+----------+
| count(*) |
+----------+
|      879 |
+----------+
1 row in set (0.001 sec)

delete from logging where log_user_text="";

Hoping that logic improvements in the latest Actor Migration codebase could save me, I setup MediaWiki-Docker (with MySQL backing),

Product	Version
MediaWiki	1.36.0-alpha (2cc3d4f)
15:13, 19 August 2020
PHP	7.2.31-1+0~20200514.41+debian9~1.gbpe2a56b+wmf1 (fpm-fcgi)
MariaDB	10.5.4-MariaDB-log
ICU	57.1

but observed the same outcome:

Beginning migration of logging.log_user and logging.log_user_text to logging.log_actor
... log_id=100
Wikimedia\Rdbms\DBQueryError from line 1699 of /var/www/html/includes/libs/rdbms/database/Database.php: Error 1062: Duplicate entry '' for key 'actor_name' (mariadb-main)
Function: MigrateActors::addActorsForRows
Query: INSERT INTO `actor` (actor_name) VALUES ('')

I noticed something interesting about the nature of records in the logging table with empty log_user_text: There are many with the 'log_action' = 'create' and the 'log_title' appears to be the username. Eg.

select log_type, log_action, log_timestamp, log_user, log_title from logging where log_user_text = "" AND log_action = "create" ORDER BY log_timestamp LIMIT 10;
+----------+------------+----------------+----------+----------------+
| log_type | log_action | log_timestamp  | log_user | log_title      |
+----------+------------+----------------+----------+----------------+
| newusers | create     | 20091002115948 |    10933 | Roseellen      |
| newusers | create     | 20091002144431 |    10934 | Phyloe         |
| newusers | create     | 20091002152118 |    10935 | AbbyR          |
| newusers | create     | 20091002154620 |    10936 | Gaylehix       |
| newusers | create     | 20091002171140 |    10937 | CheshireLass   |
| newusers | create     | 20091002180039 |    10938 | Cajflem        |
| newusers | create     | 20091002180409 |    10939 | Arushton2      |
| newusers | create     | 20091002192846 |    10940 | Indian_woman   |
| newusers | create     | 20091002195425 |    10941 | Soniabrague    |
| newusers | create     | 20091002201705 |    10942 | Clarence_Frick |
+----------+------------+----------------+----------+----------------+

I wonder if this data can be used to correctly identify the 'log_user_text' for the corresponding 'log_user'? In other words, is log_user 10933 "Roseellen"? Checking the user table, there is no record for user_id 10933.

Expanding on the above, using a different larger wiki database with the same issue, there are 1,571 records in the logging table having an empty log_user_text (aka username) where the type is "newusers" and action = "create" [1]. Not one of those log_user (ids) exists in the user table. [2]

It's seeming more plausible to me that the data in log_title should be used to populate log_user_text for these records. I guess this isn't a problem seen by WMF wikis since the historical login system for WMF was different than this 3rd-party wiki.

Comparing user.user_name and logging.log_title for these records reveals that there ARE some records in the user table that match - but they have different ids.[3]

[1] select log_type, log_action, log_timestamp, log_user, log_title from logging where log_type = "newusers" AND log_action = "create" AND log_user_text = "";
[2] select count(u.user_name) from user u where user_id in (select log_user from logging where log_type = "newusers" AND log_action = "create" AND log_user_text = "");
[3] select u.user_name from user u where user_name in (select log_title from logging where log_type = "newusers" AND log_action = "create" AND log_user_text = "");

Hi....Seems like the task opened an year ago. Read comments and I did not see any working solution for this. Did anyone try use old mediawiki version and got it running ?

I'm still struggling with this. I am unable to upgrade a 1.32 wiki to 1.34 because no matter what I have tried, I still get multiple errors; with fatal errors relating to duplicate Actors.

When I try to run migrateActors.php I get a ton of

"User name "Groseca" is usable, cannot create an anonymous actor for it. Run maintenance/cleanupUsersWithNoId.php to fix this situation."

But cleanupUsersWithNoId.php doesn't actually do anything when you run it.

grep -A1 assigned /tmp/cleanupUsers.0.log 
Completed cleanup, assigned 0 and prefixed 0 row(s)
Beginning cleanup of archive
--
Completed cleanup, assigned 0 and prefixed 0 row(s)
Beginning cleanup of logging
--
Completed cleanup, assigned 0 and prefixed 0 row(s)
Beginning cleanup of image
--
Completed cleanup, assigned 0 and prefixed 0 row(s)
Beginning cleanup of oldimage
--
Completed cleanup, assigned 0 and prefixed 0 row(s)
Beginning cleanup of filearchive
--
Completed cleanup, assigned 0 and prefixed 0 row(s)
Beginning cleanup of ipblocks
--
Completed cleanup, assigned 0 and prefixed 0 row(s)
Beginning cleanup of recentchanges
--
Completed cleanup, assigned 0 and prefixed 0 row(s)

When running migrateActors.php, it ultimately fails fatally with Error: 1062 Duplicate entry 'Redirect fixer' for key 'actor_name'

Wikimedia\Rdbms\DBQueryError from line 1603 of /opt/htdocs/wiki/mediawiki/includes/libs/rdbms/database/Database.php: A database query error has occurred. Did you forget to run your application's database schema updater after upgrading? 
Query: INSERT  INTO `actor` (actor_name) VALUES ('Redirect fixer')
Function: MigrateActors::addActorsForRows
Error: 1062 Duplicate entry 'Redirect fixer' for key 'actor_name' (localhost)

#0 /opt/htdocs/wiki/mediawiki/includes/libs/rdbms/database/Database.php(1574): Wikimedia\Rdbms\Database->getQueryExceptionAndLog('Duplicate entry...', 1062, 'INSERT  INTO `a...', 'MigrateActors::...')
#1 /opt/htdocs/wiki/mediawiki/includes/libs/rdbms/database/Database.php(1152): Wikimedia\Rdbms\Database->reportQueryError('Duplicate entry...', 1062, 'INSERT  INTO `a...', 'MigrateActors::...', false)
#2 /opt/htdocs/wiki/mediawiki/includes/libs/rdbms/database/Database.php(2140): Wikimedia\Rdbms\Database->query('INSERT  INTO `a...', 'MigrateActors::...')
#3 /opt/htdocs/wiki/mediawiki/includes/libs/rdbms/database/DBConnRef.php(68): Wikimedia\Rdbms\Database->insert('`actor`', Array, 'MigrateActors::...')
#4 /opt/htdocs/wiki/mediawiki/includes/libs/rdbms/database/DBConnRef.php(371): Wikimedia\Rdbms\DBConnRef->__call('insert', Array)
#5 /opt/htdocs/wiki/mediawiki/maintenance/includes/MigrateActors.php(217): Wikimedia\Rdbms\DBConnRef->insert('actor', Array, 'MigrateActors::...')
#6 /opt/htdocs/wiki/mediawiki/maintenance/includes/MigrateActors.php(302): MigrateActors->addActorsForRows(Object(Wikimedia\Rdbms\MaintainableDBConnRef), 'ar_user_text', Array, Array, 1896)
#7 /opt/htdocs/wiki/mediawiki/maintenance/includes/MigrateActors.php(105): MigrateActors->migrate('archive', Array, 'ar_user', 'ar_user_text', 'ar_actor')
#8 /opt/htdocs/wiki/mediawiki/maintenance/Maintenance.php(1746): MigrateActors->doDBUpdates()
#9 /opt/htdocs/wiki/mediawiki/maintenance/doMaintenance.php(99): LoggedUpdateMaintenance->execute()
#10 /opt/htdocs/wiki/mediawiki/maintenance/migrateActors.php(27): require_once('/opt/htdocs/wik...')
#11 {main}

I've tried to prepare the data beforehand by removing "bad data" from the database.

delete from `wiki_en`.`logging` where log_user_text = ""; -- 21733 rows deleted

and also re-assigning rev_user (not even sure why I'm doing this):

select rev_page, rev_user from revision where rev_user_text = 'Redirect fixer'; -- 7425
update revision set rev_user = 0 where rev_user = 7425;

Query OK, 56 rows affected (0.05 sec)

If I delete the Actor record where actor_name = 'Redirect fixer', then re-run migrateActors.php, I just get another fatal error when it eventually hits another duplicate value for actor_name = 'Redirect fixer'.

Can anyone with an understanding of the internals provide some guidance here? @Anomie ?

Ultimately, I was able to complete an upgrade by 1) using maintenance/eval.php to create the missing User; and 2) deleting some old entries from the logging table.

php ./maintenance/eval.php
User::createNew( 'Redirect fixer', [ 'id' => 7425 ] )
Ctrl + c
mysql wiki_en -e 'select * from user where user_id = 7425;'
mysql wiki_en -e 'delete from `wiki_en`.`logging` where log_user_text = "";' -- 21733 rows deleted

Then I was able to run update.php

Ultimately, I was able to complete an upgrade by 1) using maintenance/eval.php to create the missing User; and 2) deleting some old entries from the logging table.

php ./maintenance/eval.php
User::createNew( 'Redirect fixer', [ 'id' => 7425 ] )
Ctrl + c
mysql wiki_en -e 'select * from user where user_id = 7425;'
mysql wiki_en -e 'delete from `wiki_en`.`logging` where log_user_text = "";' -- 21733 rows deleted

Then I was able to run update.php

Rather than delete records, I believe it would be better to update those records having an empty log_user_text with the user_id and user_name of an existing system user like 'Redirect fixer'.

I'm seeing similar issues on an "upgrade" from an existing MW1.31 installation to MW1.35; the migrateActors operation fails when invoked from maintenance/update.sh - but usually on non-user entities like "MediaWiki default" which the script will attempt to add to the actors table, only to die on a duplicate record error.

The error basically shuts down the entire update.php script, and removing the offending record from the actors table doesn't help as a subsequent attempt to run the script finds the same issues again.

I had this issue while updating a wiki created in 2005, there were logs without users in 2 situations:

  • new users (log_type=newusers, log_action=create, i.e. self-creation, not created by an existing user): in these cases, log_user_text can be safely updated with log_title,
  • batch deletion of system messages with maintenance/deleteDefaultMessage.php (log_type=delete, log_action=delete, and I remarked it was in MediaWiki namespace and deleted in a very short time span): I attributed these logs to the first user (the 'admin' creating the wiki).

The SQL commands are below:

-- display affected logs
SELECT DISTINCT log_type, log_action, COUNT(*) FROM logging WHERE log_user_text = '' GROUP BY log_type, log_action;

-- situation 1
UPDATE logging SET log_user_text = log_title WHERE log_user_text = '' AND log_type = 'newusers' AND log_action = 'create';

-- situation 2
SELECT user_name FROM user WHERE user_id = 1; -- copy the result for the next SQL command
UPDATE logging SET log_user_text = '<insert name from previous SQL command>' WHERE log_user_text = '' AND log_type = 'delete' AND log_action = 'delete';

After setting log_user_text where it was empty, re-running update.php worked.

Studying cleanupUsersWithNoId.php and migrateActors.php, I feel there is a logical issue (in their current form):

  • cleanupUsersWithNoId.php does not create un-"usable" usernames (empty name, "MediaWiki default", bogus names…) according to this code: it’s legit but this script is useless for these usernames
  • migrateActors.php creates an anonymous actor the first time (actor_user=NULL, actor_name=specific name) but given the username is still unusable the second time, it is still tried to create an anonymous actor (this code), which fails because of UNIQUE constraint on actor_name.

This issue can be solved by ignoring duplicate creations of unuable usernames: their actor must simply exist once. The corresponding patch is to add the option IGNORE in the function insert (see this doc).

Change 765191 had a related patch set uploaded (by Seb35; author: Seb35):

[mediawiki/core@master] MigrateActors.php: ignore duplicate creations of actors

https://gerrit.wikimedia.org/r/765191

Hi,
Is there any update on this? I'm experiencing same issue after upgrade from 1.27 to 1.37. Some of the wikis work, but one of them does not. I tried deleting the row with the empty "actor_name" to no avail:

delete from xxxactor where actor_id=198

update.php script as well as migrateActors.php give this error:

Function: MigrateActors::addActorsForRows
Query: INSERT INTO xxxactor (actor_name) VALUES ('')

#0 /var/www/html/wiki/includes/libs/rdbms/database/Database.php(1793): Wikimedia\Rdbms\Database->getQueryException('Duplicate entry...', 1062, 'INSERT INTO `xx...', 'MigrateActors::...')

We have a multi wiki setup, so all other DBs somehow got successfully migrated, but this single DB was left out, the migration script does not work.

The logical fix in all situations would be to update all rows with an empty user_name, or with an xx_user_id equal to 0, or non-existing xx_user_id in the user table, setting an "external" xx_user_name like "unknown>unknown" and a xx_user_id = 0.

How do we apply the logical fix. Dropping the required SQL queries for non-mysql people will be great.

The logical fix in all situations would be to update all rows with an empty user_name, or with an xx_user_id equal to 0, or non-existing xx_user_id in the user table, setting an "external" xx_user_name like "unknown>unknown" and a xx_user_id = 0.

Thanks, but for me this does not solve the issue. I update the XXXactor table to update the empty actor_id with an 'unknown' value:
update XXXactor set actor_name='unknown' where actor_id=227;

but on the next time I run update.php, it tries to insert it again with different id... It seems like a logical problem in the update process. Can someone please help because this is quite critical?

Thanks, but for me this does not solve the issue. I update the XXXactor table to update the empty actor_id with an 'unknown' value:
update XXXactor set actor_name='unknown' where actor_id=227;

but on the next time I run update.php, it tries to insert it again with different id... It seems like a logical problem in the update process. Can someone please help because this is quite critical?

The update must be done on the individual tables where user_name is present, but not on the actor table. The actor table is populated with the contents of all the other tables.

User names that contain a ">" character (which is invalid for a user name), are treated as "external" users. That's how imports (Special:Import) are handled. The contents before the > is the wiki name (usually an interwiki prefix, but it can be a random string like "unknown" for example) and the contents after the > is the actual user name. Since those users are externals, they won't get a user id and the script should just insert a new entry for each unique external user.

This is an example of script for the revision table to update all users with empty name:

update revision set rev_user_text = 'unknown>unknown' where rev_user_text = '';

Another script for the revision table, to update users that don't have a matching user id

update revision set rev_user_text = concat('unknown>', rev_user_text), rev_user = 0 where rev_user > 0 and not exists (select * from user where user_id = rev_user);

I think this should cover most cases. Of course, that should be adapted for each table with user and user_text columns where the problem exists.

Thanks, but for me this does not solve the issue. I update the XXXactor table to update the empty actor_id with an 'unknown' value:
update XXXactor set actor_name='unknown' where actor_id=227;

but on the next time I run update.php, it tries to insert it again with different id... It seems like a logical problem in the update process. Can someone please help because this is quite critical?

The update must be done on the individual tables where user_name is present, but not on the actor table. The actor table is populated with the contents of all the other tables.

User names that contain a ">" character (which is invalid for a user name), are treated as "external" users. That's how imports (Special:Import) are handled. The contents before the > is the wiki name (usually an interwiki prefix, but it can be a random string like "unknown" for example) and the contents after the > is the actual user name. Since those users are externals, they won't get a user id and the script should just insert a new entry for each unique external user.

This is an example of script for the revision table to update all users with empty name:

update revision set rev_user_text = 'unknown>unknown' where rev_user_text = '';

Another script for the revision table, to update users that don't have a matching user id

update revision set rev_user_text = concat('unknown>', rev_user_text), rev_user = 0 where rev_user > 0 and not exists (select * from user where user_id = rev_user);

I think this should cover most cases. Of course, that should be adapted for each table with user and user_text columns where the problem exists.

Thanks! So how do I find which are the problematic tables? Which criteria should I use, user_name = what ? I would need to go over all of the tables...

Thanks! So how do I find which are the problematic tables? Which criteria should I use, user_name = what ? I would need to go over all of the tables...

I'd expect the script to print lines with the progress and what tables it's processing.

If not, you can look at the tables listed on this file, witch also lists the column names:
https://github.com/ciencia/mediawiki-extensions-StubUserWikiAuth/blob/master/maintenance/populateUserTable.php#L36

Thanks. Unfortunately this does not solve the problem either. I found the problematic table 'XXXlogging' and did the following :

MariaDB [wiki]> update XXXlogging set log_user_text='unknown>unknown' where log_user_text = '';
Query OK, 4916 rows affected (0.10 sec)
Rows matched: 4916 Changed: 4916 Warnings: 0

However, when I run php update.php I get same error again (only the string 'unknown>unknown' is different. Is there a way to keep these rows in the table and still make the update work?

Function: MigrateActors::addActorsForRows
Query: INSERT INTO XXXactor (actor_name) VALUES ('unknown>unknown')

#0 /var/www/html/wiki/includes/libs/rdbms/database/Database.php(1793): Wikimedia\Rdbms\Database->getQueryException('Duplicate entry...', 1062, 'INSERT INTO `ko...', 'MigrateActors::...')
#1 /var/www/html/wiki/includes/libs/rdbms/database/Database.php(1768): Wikimedia\Rdbms\Database->getQueryExceptionAndLog('Duplicate entry...', 1062, 'INSERT INTO `ko...', 'MigrateActors::...')
#2 /var/www/html/wiki/includes/libs/rdbms/database/Database.php(1327): Wikimedia\Rdbms\Database->reportQueryError('Duplicate entry...', 1062, 'INSERT INTO `ko...', 'MigrateActors::...', false)
#3 /var/www/html/wiki/includes/libs/rdbms/database/Database.php(2540): Wikimedia\Rdbms\Database->query('INSERT INTO `ko...', 'MigrateActors::...', 128)
#4 /var/www/html/wiki/includes/libs/rdbms/database/Database.php(2520): Wikimedia\Rdbms\Database->doInsert('actor', Array, 'MigrateActors::...')
#5 /var/www/html/wiki/includes/libs/rdbms/database/DBConnRef.php(68): Wikimedia\Rdbms\Database->insert('actor', Array, 'MigrateActors::...')
#6 /var/www/html/wiki/includes/libs/rdbms/database/DBConnRef.php(380): Wikimedia\Rdbms\DBConnRef->__call('insert', Array)
#7 /var/www/html/wiki/maintenance/includes/MigrateActors.php(219): Wikimedia\Rdbms\DBConnRef->insert('actor', Array, 'MigrateActors::...')
#8 /var/www/html/wiki/maintenance/includes/MigrateActors.php(305): MigrateActors->addActorsForRows(Object(Wikimedia\Rdbms\MaintainableDBConnRef), 'log_user_text', Array, Array, 0)
#9 /var/www/html/wiki/maintenance/includes/MigrateActors.php(115): MigrateActors->migrate('logging', Array, 'log_user', 'log_user_text', 'log_actor')
#10 /var/www/html/wiki/maintenance/includes/LoggedUpdateMaintenance.php(45): MigrateActors->doDBUpdates()

Thanks, but for me this does not solve the issue. I update the XXXactor table to update the empty actor_id with an 'unknown' value:
update XXXactor set actor_name='unknown' where actor_id=227;

but on the next time I run update.php, it tries to insert it again with different id... It seems like a logical problem in the update process. Can someone please help because this is quite critical?

The update must be done on the individual tables where user_name is present, but not on the actor table. The actor table is populated with the contents of all the other tables.

User names that contain a ">" character (which is invalid for a user name), are treated as "external" users. That's how imports (Special:Import) are handled. The contents before the > is the wiki name (usually an interwiki prefix, but it can be a random string like "unknown" for example) and the contents after the > is the actual user name. Since those users are externals, they won't get a user id and the script should just insert a new entry for each unique external user.

This is an example of script for the revision table to update all users with empty name:

update revision set rev_user_text = 'unknown>unknown' where rev_user_text = '';

Another script for the revision table, to update users that don't have a matching user id

update revision set rev_user_text = concat('unknown>', rev_user_text), rev_user = 0 where rev_user > 0 and not exists (select * from user where user_id = rev_user);

I think this should cover most cases. Of course, that should be adapted for each table with user and user_text columns where the problem exists.

Thanks! So how do I find which are the problematic tables? Which criteria should I use, user_name = what ? I would need to go over all of the tables...

Another update from me here. I have manually applied the patch as described here https://gerrit.wikimedia.org/r/c/mediawiki/core/+/765191/1/maintenance/includes/MigrateActors.php#220
and it actually worked! I was able to migrate my DB. Very helpful, hope this patch gets approved and merged.

I re-tried yesterday in a lab environment to upgrade a wiki created in 2004 and in version 1.27. Probably someone did a hard cleanup (linked to spam?) because many users were missing in the user table and a few log_user_text or rev_user_text were an empty string, leading to this issue. In this case it is hard to re-create these data, even if sometimes possible (see T229092#7730875).

So I propose my Gerrit patch (ignore duplicate added actors) to resolve this issue.

Change 765191 merged by jenkins-bot:

[mediawiki/core@master] MigrateActors.php: ignore duplicate creations of actors

https://gerrit.wikimedia.org/r/765191

Change 827193 had a related patch set uploaded (by Seb35; author: Seb35):

[mediawiki/core@REL1_35] MigrateActors.php: ignore duplicate creations of actors

https://gerrit.wikimedia.org/r/827193

Change 827193 merged by jenkins-bot:

[mediawiki/core@REL1_35] MigrateActors.php: ignore duplicate creations of actors

https://gerrit.wikimedia.org/r/827193

Seb35 claimed this task.

I close this task since the patch has been merged in master (future 1.39.0) and REL1_35 (future 1.35.8), which are both LTS versions.
NB: it is not fixed in 1.36, 1.37, or 1.38.

The patch only avoid a hard crash during updates, when the database has some unexpected values in the *_user_text fields, but other remediations methods could be better on a case-by-case basis (example in this comment T229092#7730875).

@Seb35 Thanks for the patch and backport. 1.35 is the first version where this is needed, i.e. earlier versions like 1.32 to 1.34 do not use this script?

migrateActors.php was added in 1.31 and this the beginning the option IGNORE is not present, so theoretically it could be backported in all versions. I did not request other backports since other versions are no more supported, but perhaps there are exceptions for upgrade-related patches?

In fact, if it does not exist, I thought it could be beneficial that such minimally-supported versions do exist for the only goal of upgrading databases. I did a few upgrades in the last months from pre-1.29 and it is sometimes quite challenging, for instance a 1.27 → 1.31 → 1.32 → 1.36 with 1.32 patched with T267468 (only backported to 1.35) and other patches, because more direct upgrade paths crashed and I cannot afford to take too much time optimizing the upgrade path. I re-studied the last few days how to improve this and I will propose something for T286578 and another issue near T268189, because I think there are still a lot of wikis which need upgrading from old versions.

Thanks for your info and pointers!

After falling flat on my face a couple of times by going from 1.31 to 1.35 directly I switched to upgrading 1.31 to 1.32 to 1.33 to 1.34 to 1.35. For this reason, I was asking about when this script is kicking in, i.e. when this patch will be needed initially. From your answer, I get that the patch is, given my upgrade path, already needed for the upgrade from 1.31 to 1.32 and not just for 1.35 onwards.

I do not think we will get backports to unsupported branches of MediaWiki. Actually, I do not even expect this. Thus we will have to do it locally to help overcome MediaWiki's by far darkest hours which were introduced with the schema changes between 1.31 and 1.35. I have to note that even upgrading in steps like 1.31 to 1.32 to 1.33 to 1.34 to 1.35 will to my regret not be clean in all cases.

Thanks for your info and pointers!

After falling flat on my face a couple of times by going from 1.31 to 1.35 directly I switched to upgrading 1.31 to 1.32 to 1.33 to 1.34 to 1.35. For this reason, I was asking about when this script is kicking in, i.e. when this patch will be needed initially. From your answer, I get that the patch is, given my upgrade path, already needed for the upgrade from 1.31 to 1.32 and not just for 1.35 onwards.

I do not think we will get backports to unsupported branches of MediaWiki. Actually, I do not even expect this. Thus we will have to do it locally to help overcome MediaWiki's by far darkest hours which were introduced with the schema changes between 1.31 and 1.35. I have to note that even upgrading in steps like 1.31 to 1.32 to 1.33 to 1.34 to 1.35 will to my regret not be clean in all cases.

Note that most issues are caused by broken extensions "removing" users from the database, or even users with access to mysql and running indiscriminate deletes or updates to clean up spam accounts, leaving inconsistencies in the database. Imports of contents from other wikis are also sources of problems for the migrate actors script.

I'm helping Uncyclopedia in the process of migrating MediaWiki from 1.32 to 1.35, and it required to run cleanupUsersWithNoId.php --force to fix some entries before the migration script can do their duties. However, cleanupUsersWithNoId.php is failing to catch all possible inconsistencies, which may cause the migrateActors script to fail for various reasons (Uncyclopedia imported a lot of content from a FANDOM wiki, which adds a lot of mess to the database). Hence, I modified the cleanupUsersWithNoId.php script on 1.35 to be more thorough in their checks and fixes, and you can find it here in case it helps anyone with a mess in their database: https://gist.github.com/ciencia/7cbe63b8520d4816ec36733454a2cb9a . While I could submit a patch to merge it into MediaWiki codebase, the script has been repurposed a bit, and it doesn't cleanup users with no id, but also fixes users with invalid ids, and I don't want to waste too much time on a lengthy review for a one-time run script.