Page MenuHomePhabricator

Duplicate entry on Table pagelinks error during update script
Open, Needs TriagePublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

  • Run update.php script after 1.39 -> 1.43 upgrade

What happens?:

The following error occured, on the second run it didn't occur again

# php maintenance/update.php --quick
...
Table pagelinks contains pl_title field. Dropping...Wikimedia\Rdbms\DBQueryError from line 1198 of /var/www/html/includes/libs/rdbms/database/Database.php: Error 1062: Duplicate entry '4707-18' for key 'PRIMARY'
Function: Wikimedia\Rdbms\Database::sourceFile( /var/www/html/maintenance/archives/patch-pagelinks-drop-pl_title.sql )
Query: ALTER TABLE `pagelinks`
 ADD PRIMARY KEY (pl_from, pl_target_id)
#0 /var/www/html/includes/libs/rdbms/database/Database.php(1182): Wikimedia\Rdbms\Database->getQueryException('Duplicate entry...', 1062, 'ALTER TABLE `pa...', 'Wikimedia\\Rdbms...')
#1 /var/www/html/includes/libs/rdbms/database/Database.php(1156): Wikimedia\Rdbms\Database->getQueryExceptionAndLog('Duplicate entry...', 1062, 'ALTER TABLE `pa...', 'Wikimedia\\Rdbms...')
#2 /var/www/html/includes/libs/rdbms/database/Database.php(647): Wikimedia\Rdbms\Database->reportQueryError('Duplicate entry...', 1062, 'ALTER TABLE `pa...', 'Wikimedia\\Rdbms...', false)
#3 /var/www/html/includes/libs/rdbms/database/Database.php(2791): Wikimedia\Rdbms\Database->query(Object(Wikimedia\Rdbms\Query), 'Wikimedia\\Rdbms...')
#4 /var/www/html/includes/libs/rdbms/database/Database.php(2730): Wikimedia\Rdbms\Database->sourceStream(Resource id #36895, NULL, NULL, 'Wikimedia\\Rdbms...', NULL)
#5 /var/www/html/includes/libs/rdbms/database/DBConnRef.php(127): Wikimedia\Rdbms\Database->sourceFile('/var/www/html/m...')
#6 /var/www/html/includes/libs/rdbms/database/DBConnRef.php(799): Wikimedia\Rdbms\DBConnRef->__call('sourceFile', Array)
#7 /var/www/html/includes/installer/DatabaseUpdater.php(797): Wikimedia\Rdbms\DBConnRef->sourceFile('/var/www/html/m...')
#8 /var/www/html/includes/installer/DatabaseUpdater.php(921): MediaWiki\Installer\DatabaseUpdater->applyPatch('/var/www/html/m...', false, 'Table pagelinks...')
#9 /var/www/html/includes/installer/DatabaseUpdater.php(595): MediaWiki\Installer\DatabaseUpdater->dropField('pagelinks', 'pl_title', 'patch-pagelinks...')
#10 /var/www/html/includes/installer/DatabaseUpdater.php(548): MediaWiki\Installer\DatabaseUpdater->runUpdates(Array, false)
#11 /var/www/html/maintenance/update.php(195): MediaWiki\Installer\DatabaseUpdater->doUpdates(Array)
#12 /var/www/html/maintenance/includes/MaintenanceRunner.php(703): UpdateMediaWiki->execute()
#13 /var/www/html/maintenance/doMaintenance.php(100): MediaWiki\Maintenance\MaintenanceRunner->run()
#14 /var/www/html/maintenance/update.php(308): require_once('/var/www/html/m...')
#15 {main}

What should have happened instead?:

No error

Software version (on Special:Version page; skip for WMF-hosted wikis like Wikipedia):

MW 1.43

Other information (browser name/version, screenshots, etc.):

Event Timeline

mysql> describe pagelinks;
+-------------------+---------------------+------+-----+---------+-------+
| Field             | Type                | Null | Key | Default | Extra |
+-------------------+---------------------+------+-----+---------+-------+
| pl_from           | int(10) unsigned    | NO   |     | 0       |       |
| pl_from_namespace | int(11)             | NO   | MUL | 0       |       |
| pl_target_id      | bigint(20) unsigned | NO   | MUL | NULL    |       |
+-------------------+---------------------+------+-----+---------+-------+
3 rows in set (0.00 sec)

mysql> select * from pagelinks;
+---------+-------------------+--------------+
| pl_from | pl_from_namespace | pl_target_id |
+---------+-------------------+--------------+
...
|    4677 |                 0 |           18 |
|    4707 |                 0 |           18 |
|    4707 |                 0 |           18 |
|    5304 |                 0 |           18 |
...

Due to this error, the update.php script was interrupted, the database was corrupted and the upgrade was compromised. We had to downgrade back to MW 1.39 and restore the backup.

This is because some bad data stayed longer. I don't think my code introduced bad data. It just fails in their existence. We can add a clean up step to make sure bad data is cleaned up though. Shouldn't be hard.

It is not clear how this duplicate entry during time occured, please explain what you meant by "some bad data stayed longer".
pl_from_namespace was not always a primary key on table pagelinks, could be this the case here?
By adding a clean up step, do you mean the update.php can for example starting automatically the rebuildall.php maintenance script instead of failing?

It is not clear how this duplicate entry during time occured, please explain what you meant by "some bad data stayed longer".
pl_from_namespace was not always a primary key on table pagelinks, could be this the case here?

It is very likely two rows with pl_title and namespace existed but were not caught by unique keys or PK.

By adding a clean up step, do you mean the update.php can for example starting automatically the rebuildall.php maintenance script instead of failing?

That'd be too risky during an update. It simply can just remove rows that more than one exists.