Page MenuHomePhabricator

Make MimeAnalyser fast
Closed, ResolvedPublic

Description

@ori wrote on Gerrit:

Avoid initializing the MimeAnalyzer for common output formats

~1.5% of api.php CPU time on the WMF cluster is spent parsing the MIME
databases[1], in order to map just handful of common MIME types to their
file extension ('application/json' to 'json', 'text/plain' to 'txt',
etc.). Speed that up by using a small associative array to map the
standard set of API output formats to their file extensions.

[1]: https://performance.wikimedia.org/arclamp/svgs/daily/2020-05-03.excimer.api.reversed.svgz

@Krinkle wrote on Gerrit:

Hm.. that's pretty expensive indeed. This doesn't feel like it needs to be expensive though. Looking at that class, I think we can simplify it a lot.

mime.info and mime.types can be simple arrays. It's a bit of work to map out all the ways they can be overridden and ensure we preserve it (or deprecate it responsibly). Let's make MimeMagic fast!

Event Timeline

Krinkle created this task.May 8 2020, 4:56 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 8 2020, 4:56 PM

Change 595082 had a related patch set uploaded (by Krinkle; owner: Ori.livneh):
[mediawiki/extensions/MolHandler@master] Remove obsolete MimeMagicInit hook handler

https://gerrit.wikimedia.org/r/595082

Tracking on our radar for notifs and code review. Thanks @ori!

Krinkle updated the task description. (Show Details)May 8 2020, 4:59 PM

Change 595082 merged by jenkins-bot:
[mediawiki/extensions/MolHandler@master] Remove obsolete MimeMagicInit hook handler

https://gerrit.wikimedia.org/r/595082

Change 596060 had a related patch set uploaded (by Ori.livneh; owner: Ori.livneh):
[mediawiki/core@master] Fix parsing of mime.info

https://gerrit.wikimedia.org/r/596060

Change 596060 merged by jenkins-bot:
[mediawiki/core@master] mime: Fix whitespace parsing of 'mime.info' file

https://gerrit.wikimedia.org/r/596060

Change 596671 had a related patch set uploaded (by Ori.livneh; owner: Ori.livneh):
[mediawiki/core@master] mime: 'mimetoExt' => 'mimeToExt'

https://gerrit.wikimedia.org/r/596671

Change 596671 merged by jenkins-bot:
[mediawiki/core@master] mime: 'mimetoExt' => 'mimeToExt'

https://gerrit.wikimedia.org/r/596671

Change 596698 had a related patch set uploaded (by Ori.livneh; owner: Ori.livneh):
[mediawiki/core@master] mime: Convert built-in MIME mappings to PHP arrays

https://gerrit.wikimedia.org/r/596698

Change 596698 merged by jenkins-bot:
[mediawiki/core@master] mime: Convert built-in MIME mappings to PHP arrays

https://gerrit.wikimedia.org/r/596698

ori added a comment.May 21 2020, 6:28 PM

Let's keep this open for now; there's a bit more follow-up work needed to clean up some residual cruft.

Change 598075 had a related patch set uploaded (by Ori.livneh; owner: Ori.livneh):
[mediawiki/core@master] mime: represent lists as arrays instead of space-delimited strings

https://gerrit.wikimedia.org/r/598075

Change 598075 merged by jenkins-bot:
[mediawiki/core@master] mime: Represent lists as arrays instead of space-delimited strings

https://gerrit.wikimedia.org/r/598075

Change 598181 had a related patch set uploaded (by Krinkle; owner: Ori.livneh):
[mediawiki/core@master] mime: Update usage of MimeAnalyzer methods

https://gerrit.wikimedia.org/r/598181

Change 598181 merged by jenkins-bot:
[mediawiki/core@master] mime: Update usage of MimeAnalyzer methods

https://gerrit.wikimedia.org/r/598181

Reedy added a subscriber: Reedy.May 30 2020, 8:10 PM

I think this may have caused T254078: STL upload file causes "Extension is null." via UploadStash.

If I revert all three commits, the warning changes to

Reedy added a comment.May 30 2020, 8:17 PM
$ php maintenance/eval.php 
> $magic = MediaWiki\MediaWikiServices::getInstance()->getMimeAnalyzer();

> $mimeType = $magic->guessMimeType( '/tmp/Holterhöfchen.stl' );

> $extension = $magic->getExtensionFromMimeTypeOrNull( $mimeType );

> var_dump( $mimeType, $extension );
/var/www/wiki/mediawiki/core/maintenance/eval.php(78) : eval()'d code:1:
string(15) "application/sla"
/var/www/wiki/mediawiki/core/maintenance/eval.php(78) : eval()'d code:1:
string(3) "stl"

That.. doesn't help :/

Reedy added a comment.May 30 2020, 9:47 PM

Ok, so more debugging...

DjVuImage::getInfo: not a DjVu file
[Mime] MimeAnalyzer::guessMimeType: internal type detection failed for /tmp/phpg32Uek (.)...

[Mime] MimeAnalyzer::detectMimeType: magic mime type of /tmp/phpg32Uek: application/octet-stream

[Mime] MimeAnalyzer::guessMimeType: guessed mime type of /tmp/phpg32Uek: unknown/unknown

[Mime] MimeAnalyzer::improveTypeFromExtension: refusing to guess mime type for .stl file, we should have recognized it

[Mime] MimeAnalyzer::improveTypeFromExtension: improved mime type for .stl: unknown/unknown

MediaHandlerFactory::getHandler: no handler found for unknown/unknown.
mime: <unknown/unknown> extension: <stl>
UploadBase::detectScript: checking for embedded scripts and HTML stuff
UploadBase::detectScript: no scripts found
ZipDirectoryReader: Fatal error: zip file lacks EOCDR signature. It probably isn't a zip file.
UploadBase::detectVirus: virus scanner disabled
[Mime] MimeAnalyzer::doGuessMimeType: analyzing head and tail of /tmp/phpg32Uek for magic numbers.

DjVuImage::getInfo: not a DjVu file
[Mime] MimeAnalyzer::guessMimeType: internal type detection failed for /tmp/phpg32Uek (.)...

[Mime] MimeAnalyzer::detectMimeType: magic mime type of /tmp/phpg32Uek: application/octet-stream

[Mime] MimeAnalyzer::guessMimeType: guessed mime type of /tmp/phpg32Uek: unknown/unknown

[Mime] MimeAnalyzer::improveTypeFromExtension: improved mime type for .:

[Mime] MimeAnalyzer::doGuessMimeType: analyzing head and tail of /tmp/phpg32Uek for magic numbers.

DjVuImage::getInfo: not a DjVu file
[Mime] MimeAnalyzer::guessMimeType: internal type detection failed for /tmp/phpg32Uek (.)...

[Mime] MimeAnalyzer::detectMimeType: magic mime type of /tmp/phpg32Uek: application/octet-stream

[Mime] MimeAnalyzer::guessMimeType: guessed mime type of /tmp/phpg32Uek: unknown/unknown

MediaHandlerFactory::getHandler: no handler found for .
UploadStash::stashFile stashing file at '/tmp/phpg32Uek'
[Mime] MimeAnalyzer::guessMimeType: WARNING: use of the $ext parameter is deprecated. Use improveTypeFromExtension($mime, $ext) instead.

[Mime] MimeAnalyzer::doGuessMimeType: analyzing head and tail of /tmp/phpg32Uek for magic numbers.

DjVuImage::getInfo: not a DjVu file
[Mime] MimeAnalyzer::guessMimeType: internal type detection failed for /tmp/phpg32Uek (.1)...

[Mime] MimeAnalyzer::detectMimeType: WARNING: use of the $ext parameter is deprecated. Use improveTypeFromExtension($mime, $ext) instead.

[Mime] MimeAnalyzer::detectMimeType: magic mime type of /tmp/phpg32Uek: application/octet-stream

[Mime] MimeAnalyzer::guessMimeType: guessed mime type of /tmp/phpg32Uek: unknown/unknown

I'll try with the changes reverted again

git revert 7e01e86e09c7051942ca46f9c92213db5fa03c46
git revert 7c9e19ed5ece1f70b41bd07018e93678f2bc9568
git revert cb44ddf85b09d48322f9326b948925b9e3022b92
Reedy added a comment.EditedMay 30 2020, 9:49 PM

Ok, I'll stop spamming this ticket, doesn't seem to be a problem with these changes

Errors are roughly the same

[Mime] MimeAnalyzer::guessMimeType: internal type detection failed for /tmp/phpD7s5wq (.)...

[Mime] MimeAnalyzer::detectMimeType: magic mime type of /tmp/phpD7s5wq: application/octet-stream

[Mime] MimeAnalyzer::guessMimeType: guessed mime type of /tmp/phpD7s5wq: unknown/unknown

[Mime] MimeAnalyzer::improveTypeFromExtension: refusing to guess mime type for .stl file, we should have recognized it

[Mime] MimeAnalyzer::improveTypeFromExtension: improved mime type for .stl: unknown/unknown

MediaHandlerFactory::getHandler: no handler found for unknown/unknown.

mime: <unknown/unknown> extension: <stl>
ori added a comment.Jun 1 2020, 7:47 PM

The change @Reedy had to revert was not one of the core changes with a performance-impact.

MimeAnalyzer doesn't show up in recent daily reversed flame graphs for api.php, or it's too small to notice. So this can be clocked as a win.

https://performance.wikimedia.org/arclamp/svgs/daily/2020-05-31.excimer.api.reversed.svgz

ori closed this task as Resolved.Jun 1 2020, 7:47 PM