Page MenuHomePhabricator

FSFileBackend spends a lot of time generating uneeded sha1 hashes that are expensive for large files
Closed, ResolvedPublic

Description

See T191805#9230056

When testing uploading an 8.5 gb file with Filesystem backend, a surprising amount of time during file operations were spent doing sha1 calculations. At first glance, it appears often these calculations aren't even used. (They were also already known to MW as we have them in the DB).

This was taking about 2-3 minutes to calcuate. Was about 1/3 of a Copy operation (aka moving an image page), and 100% of a MoveOp (e.g. deleting an image).

At the very least, we should only generate them if we actually intend to use it somewhere

Event Timeline

Change 963860 had a related patch set uploaded (by Brian Wolff; author: Brian Wolff):

[mediawiki/core@master] Avoid calculating SHA-1 during file operations unless needed

https://gerrit.wikimedia.org/r/963860

Change 963860 abandoned by Brian Wolff:

[mediawiki/core@master] Avoid calculating SHA-1 during file operations unless needed

Reason:

I misunderstood how this works. I didn't realize all pre-checks all at once for all ops in a batch

https://gerrit.wikimedia.org/r/963860

Change 963860 restored by Brian Wolff:

[mediawiki/core@master] Avoid calculating SHA-1 during file operations unless needed

https://gerrit.wikimedia.org/r/963860

Change 963860 had a related patch set uploaded (by Brian Wolff; author: Brian Wolff):

[mediawiki/core@master] During FileOp only evaluate SHA1 if it is needed as expensive for big files

https://gerrit.wikimedia.org/r/963860

Change 1005568 had a related patch set uploaded (by Aaron Schulz; author: Aaron Schulz):

[mediawiki/core@master] filebackend: add FileStatePredicates helper class for file operations

https://gerrit.wikimedia.org/r/1005568

Change 963860 abandoned by Brian Wolff:

[mediawiki/core@master] During FileOp only evaluate SHA1 if it is needed as expensive for big files

Reason:

Abandon in favour of https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1005568 which is similar idea but done much better

https://gerrit.wikimedia.org/r/963860

Change 1005568 merged by jenkins-bot:

[mediawiki/core@master] filebackend: add FileStatePredicates helper class for file operations

https://gerrit.wikimedia.org/r/1005568

When i was testing this, the new version did still calculate some SHA1's in maybeUpgradeRow in a deferredupdate. However that is unrelated to file backend, and possibly is something about my local install or the file type handler.

When i was testing this, the new version did still calculate some SHA1's in maybeUpgradeRow in a deferredupdate. However that is unrelated to file backend, and possibly is something about my local install or the file type handler.

Was this for a very old MW install with old files? It would be odd otherwise (possible filerepo issue).

The install was old but the file was new. I didn't really investigate further. Possibly it was a timedmediahandler issue as my test file was a mp4 video.