Page MenuHomePhabricator

LocalFile::lock() rearchitecting
Closed, ResolvedPublic

Description

As I wrote at T274589#7091659, LocalFile::lock() starts a database transaction and then waits for a file lock for an unconfigurable 10 seconds. Normally it is used to protect a user-initiated file operation. If the file operation takes too long, a database error results, due to the transaction being lost.

I think it would be better for users if lock acquisition failed quickly with a clear error message informing the user that a concurrent operation is in progress. Think of it like an edit conflict rather than a transaction.

Caller survey:

  • LocalFile::upload, LocalFile::publishTo, LocalFile::move, LocalFile::deleteFile, LocalFile::deleteOldFile, LocalFile::restore, LocalFileDeleteBatch::execute, LocalFileMoveBatch::execute, LocalFileRestoreBatch::execute, OldLocalFile::uploadOld, DeleteArchivedFiles::execute
    • The lock protects a user-initiated write operation involving the file backend. In all cases, a Status object is returned from the lock() caller. All of these cases would benefit from fast failure of lock(), i.e. zero wait timeout, with an error returned in the Status instead of a LocalFileLockError being thrown.
  • LocalFile::upgradeRow, LocalFile::getSha1
    • In both cases, a read is done from the file backend, and the result is written to the database. We're not protecting the backend, we just want to read from the backend at a time when no write operation is in progress.
    • getSha1() can be called from a gallery of many images, in which case neither latency nor an uncaught exception is tolerable.
    • upgradeRow() is now done as a DeferredUpdate. Maybe this is the sole case in which waiting 10s for a lock is acceptable.
    • In either case, I don't think it's correct to block a user-initiated write operation due to a purge or upgrade being in progress. I think what we want is to read the file properties without protection, and to do a staleness check. If the information is stale, don't write it back to the DB. The code would be like:
$originalTimestamp = $dbw->selectField( 'image', 'img_timestamp', [ 'img_name' => $name ] );
$data = $this->getDataFromFile();
$dbw->startAtomic();
$finalTimestamp = $dbw->selectField( 'image', 'img_timestamp', [ 'img_name' => $name ],..., [ 'FOR UPDATE' ] );
if ( $originalTimestamp != $finalTimestamp ) {
   // Probably someone else did this update
   return;
}
$dbw->update( ... );
$dbw->endAtomic();

Plan summary:

  • Deprecate LocalFile::lock() and LocalFile::unlock()
  • Add LocalFile::lockForWrite(). This would do a non-blocking lock and return a ScopedCallback or null. If a ScopedCallback is returned, the lock was successfully acquired, and the callback should be consumed to terminate the lock. If null is returned, the lock was not successfully acquired, so an error should be returned in a Status. Does not open a transaction.
  • Migrate callers. Start a transaction in the caller where necessary.
  • In upgradeRow() and getSha1(), replace locking with a staleness check.

Event Timeline

Change 735761 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[mediawiki/core@master] [WIP] LockManager: distinguish conflicts from other kinds of lock errors

https://gerrit.wikimedia.org/r/735761

Change 735762 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[mediawiki/core@master] [WIP] Stop using LocalFile::lock()

https://gerrit.wikimedia.org/r/735762

Change 736176 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[mediawiki/core@master] Remove on-demand update of img_sha1

https://gerrit.wikimedia.org/r/736176

Change 736176 merged by jenkins-bot:

[mediawiki/core@master] Remove on-demand update of img_sha1

https://gerrit.wikimedia.org/r/736176

Change 735761 merged by jenkins-bot:

[mediawiki/core@master] LockManager: distinguish conflicts from other kinds of lock errors

https://gerrit.wikimedia.org/r/735761

Change 735762 merged by jenkins-bot:

[mediawiki/core@master] Stop using LocalFile::lock()

https://gerrit.wikimedia.org/r/735762

Change 740381 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[mediawiki/core@master] LocalFileMoveBatch lock improvements

https://gerrit.wikimedia.org/r/740381

Change 740381 merged by jenkins-bot:

[mediawiki/core@master] LocalFileMoveBatch lock improvements

https://gerrit.wikimedia.org/r/740381

This is basically done. The only thing that was not done was the ScopedCallback, but locks will get released eventually by the LockManager destructor.