Page MenuHomePhabricator

purgeThumbnails should support exclusion of expensive files
Closed, ResolvedPublic

Description

Author: mdale

Description:
Exclude From Thumbnail Purge patch

With a transcoding extension you don't necessarily want to purge all the transcoded files every time someone visits an image page with ?action=purge, attached a quick implementation for excluding some file extension types with a configuration variable.


Version: unspecified
Severity: enhancement

attachment excludeFromThumbnailPurge.patch ignored as obsolete

Details

Reference
bz27641

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 11:29 PM
bzimport set Reference to bz27641.
bzimport added a subscriber: Unknown Object (MLST).

quite so, but how WOULD you purge those files if needed ? We can't go asking for sys admin intervention everytime something goes wrong of course. That would be unmaintainable.

mdale wrote:

I would recommend a maintenance script. It would ideally only have to be run when you update the transcoding settings or when you update the encoders.

ideally doens't exist in Wikipedia. Servers die unexpectedly. Encoders crash. Partly generated files that have to be regenerated will present themselves, it is unavoidable.

mdale wrote:

Error handling is important. If the encoder caches or the server dies unexpectedly the file will never be considered "encoded" since it only gets moved after a successful encode. Its certainly possible the encoder thinks the file was successfully encoded while human viewers disagree. But re-running the encode with the same settings every time someone requests ?action=purge will likely just result in the same broken file being regenerated over and over.

It would be nice to let users register a given video has bad encode or has "playback issues", but that is a separate feature than the bug being discussed here. ( probably a phase 2 feature )

Purges should be managed via maintenance scripts so we it can be run in conjunction with new encoder deployments or deployments of changes in encoding parameters.

Users are not completely dependent on these maintenance scripts they can "upload another version of the file" which will of result in purge of transcodes.

If in practice we run into the situation where derivatives being purged and re-encoded with the same settings will 'make the files work', then yes I am open to bump the priority on letting some users purge transcodes somehow.

*But the original bug is about differentiating from image purges* which I think we should do regardless.

(In reply to comment #0)

Created attachment 8196 [details]
Exclude From Thumbnail Purge patch

$wgExcludeFromThumbnail must be defined and documented in DefaultSettings.php . Looks good to me otherwise, although I guess I'd like someone better-versed in our media handling code such as Bryan look over it, and of course Derk-Jan makes a good point about needing to be able to purge the transcoded versions somehow.

attachment excludeFromThumbnailPurge.patch ignored as obsolete

Bryan.TongMinh wrote:

Why path info instead of substr?

Also, I'd rather have this based on mime type instead of extension. I can imagine for example that .ogg vorbis files are much less expensive to purge than .ogg theora files.

mdale wrote:

(In reply to comment #6)

Why path info instead of substr?

Its documented in PHP as a consist fast way to get a file extension without any regular expression or searching for the last period. Its coverage goes back to php4. Feel free to use substr with strrpos if you want but I think path info is less verbose.

Also, I'd rather have this based on mime type instead of extension. I can
imagine for example that .ogg vorbis files are much less expensive to purge
than .ogg theora files.

First I don´t think ogg audio is very different from the rational mentioned above. ( ie we would want to explicit purge it ) In this version we don't even transcode audio, and with ogg audio files that /we/ gennerate /we/ would always use the .oga extension ( so there would be no ambiguity ) ... but assuming we want more explicit control than "extension based file types" over what we purge and do not purge ::

Mime type parsing would require some more complexity. If we just use the php mime_content_type call I believe its also file extension based per the systems magic.mime file. ( extensions are suposed to communicate mime type )

Alternatively we actually parse the file ( resource intensive since we do this in php for all our files and with ogg parsing I believe we do a linear read of the entire file to catch any concatenated streams which we would have to work around or just accept some computational delay anytime you want to purge video thumbnails )

Or we have some lookup table per filename{transcodeKey}.ext....

Maybe the easiest change would be to rename this variable to $wgExcludePostfixKeyFromThumbnailPurge and then let it check anything after the base file name. So things like {myfile.ogg.AUDIO_64K.oga can be represented in the settings array as "AUDIO_64K.oga" ?

For version 2, its best to give every transcode its own db entry where we store the version of the encoder and all the settings used. This way we have fine grain indexed select purging capabilities for the extension maintenance scripts. And maybe we put “derivatives” in a sperate folder from “thumbnails” if we want to handle them very diffrently.

Bryan.TongMinh wrote:

We already know the mime type: $file->getMimeType(). The point is that a single extension may map to multiple multiple mime types, but multiple extensions may also map to a single mime type. Mime types however are normalized within MediaWiki.

Bryan.TongMinh wrote:

Oh right, $file is a file name and not a file object. It is of course possible to have thumbnails for a single file in different formats.

With your patch, do you wish to exclude certain types of source files, or certain types of thumbnails?

mdale wrote:

Right $file is for the parent source file.. some trickery would be needed to have it map its methods to thumbnail files. We only want to exclude certain types of "thumbnails" .. ie for a given source Ogg video file the purge request will purge the jpeg thumbnails but will not purge the transcoded webm derivative. For other assets like svg and img, they won't be affected since we don't generate webm or ogv derivative files for them.

The reason its oky to use the file extension, for the exclusion list is that we are only looking at files that we generate so we have a bit more control of the mapping of extension to file type.

If we do this, it'd be nice to have an extra parameter to say the api purge module to do a forced purge. This would stop people from accidently purging pages as you pretty much need to be a power user to use the api.

mdale wrote:

(In reply to comment #11)

If we do this, it'd be nice to have an extra parameter to say the api purge
module to do a forced purge. This would stop people from accidently purging
pages as you pretty much need to be a power user to use the api.

Maybe require they have admin rights as well? Remember some derivatives like HD webm are not even close to real time encoded on moderately fast hardware, so they do take quite a bit of resources to re-encode.

Bryan.TongMinh wrote:

(In reply to comment #10)

The reason its oky to use the file extension, for the exclusion list is that we
are only looking at files that we generate so we have a bit more control of the
mapping of extension to file type.

Good one, I didn't realize originally that this patch concerned thumbnails instead of the original file.

mdale wrote:

Exclude From Thumbnail Purge patch

Here is the patch with the lines that were missing from DefaultSettings.php

Attached:

mdale wrote:

patch to restore r84395,r98710. filter thumb purge in media handlers

patch to restore r84395,r98710. filter thumb purge in media handlers

Attached:

Gilles raised the priority of this task from Medium to Unbreak Now!.Dec 4 2014, 10:29 AM
Gilles moved this task from Untriaged to Done on the Multimedia board.
Gilles lowered the priority of this task from Unbreak Now! to Medium.Dec 4 2014, 11:21 AM