Page MenuHomePhabricator

Second run of patch-ref_id-phase2.sql results in Duplicate entry error for a primary key
Open, Needs TriagePublic

Description

I haven't looked into why this only happens the second time I run the update, but right now I'm running into the following backtrace the second time I run update.php except that without my fix, the following line appears right after "...flow_revision table does not contain rev_user_text field."

The backtrace is

Adding index PRIMARY to table flow_wiki_ref ...A database query error has occurred.
Query: ALTER TABLE `wiki1flow_wiki_ref` ADD PRIMARY KEY (ref_id)

Function: DatabaseBase::sourceFile( /var/www/html/w/extensions/Flow/db_patches/patch-ref_id-phase2.sql )
Error: 1062 Duplicate entry '\x00' for key 'PRIMARY' (cihcisddb352v.corporate.ge.com:7780)

Backtrace:
#0 /var/www/html/w/includes/db/Database.php(901): DatabaseBase->reportQueryError('Duplicate entry...', 1062, 'ALTER TABLE `wi...', 'DatabaseBase::s...', false)
#1 /var/www/html/w/includes/db/Database.php(3057): DatabaseBase->query('ALTER TABLE `wi...', 'DatabaseBase::s...')
#2 /var/www/html/w/includes/db/Database.php(2978): DatabaseBase->sourceStream(Resource id #582, false, false, 'DatabaseBase::s...', false)
#3 /var/www/html/w/includes/installer/DatabaseUpdater.php(669): DatabaseBase->sourceFile('/var/www/html/w...')
#4 /var/www/html/w/includes/installer/DatabaseUpdater.php(742): DatabaseUpdater->applyPatch('/var/www/html/w...', true, 'Adding index PR...')
#5 [internal function]: DatabaseUpdater->addIndex('flow_wiki_ref', 'PRIMARY', '/var/www/html/w...', true)
#6 /var/www/html/w/includes/installer/DatabaseUpdater.php(455): call_user_func_array(Array, Array)
#7 /var/www/html/w/includes/installer/DatabaseUpdater.php(420): DatabaseUpdater->runUpdates(Array, true)
#8 /var/www/html/w/maintenance/update.php(179): DatabaseUpdater->doUpdates(Array)
#9 /var/www/html/w/maintenance/doMaintenance.php(103): UpdateMediaWiki->execute()
#10 /var/www/html/w/maintenance/update.php(224): require_once('/var/www/html/w...')
#11 {main}

Patch incoming.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Change 358892 had a related patch set uploaded (by MarkAHershberger; owner: MarkAHershberger):
[mediawiki/extensions/Flow@master] Avoid a backtrace when running

https://gerrit.wikimedia.org/r/358892

So, now that I've looked at this some more, it looks like I should have run FlowPopulateRefId.php.

There should be a way to capture this and tell the user.

Change 358892 abandoned by MarkAHershberger:
Avoid a backtrace when running

Reason:
The right fix is to run the maintenance script FlowPopulateRefId. I wish I knew how to tell users that when they run into this.

https://gerrit.wikimedia.org/r/358892

Does FlowPopulateRefId get run by update.php already? Maybe it's just run too late? By default, such maintenance scripts are run after all schema changes, and that appears to be harmful in this case.

Looking at the code, I don't see how you could ever end up in this situation. The relevant code is as follows:

/*
 * Add primary key, but only after we've made sure the newly added
 * column has been populated (otherwise they'd all be null values)
 */
if ( $updater->updateRowExists( 'FlowPopulateRefId' ) ) {
        if ( $updater->getDB()->getType() === 'sqlite' ) {
                $updater->addExtensionIndex( 'flow_wiki_ref', 'PRIMARY', "$dir/db_patches/patch-ref_id-phase2.sqlite.sql" );
        } else {
                $updater->addExtensionIndex( 'flow_wiki_ref', 'PRIMARY', "$dir/db_patches/patch-ref_id-phase2.sql" );
        }
}

So it only attempts to run patch-ref_id-phase2.sql if the updatelog table claims that FlowPopulateRefId has already run. That check has been there since the very beginning, as well (it was added at the same time as patch-ref_id-phase2.sql itself).

Since you say it only happens the second time you run the update, perhaps the first update attempted to run FlowPopulateRefId and logged that it ran, but it failed? Or maybe a bunch of time elapsed between the first and second update during which multiple nulls were inserted? It might be better for us to sequence these updates properly and run the population first, then the primary key patch, without relying on the user running update,php twice in rapid succession.