Page MenuHomePhabricator

Automated tracking of broken files
Open, LowPublic

Description

Some files are corrupted in some way, and I think it would be useful to track those cases automatically (e.g. by way of a subcategory of Category:Files with errors Commons) that MediaWiki identifies as being problematic, e.g. by way of error messages like "Invalid Ogg file: Stream Undecodable" (cf. T63900).

There are ways to track such files manually (e.g. through Template:Broken file), but automating the process would certainly help in getting the individual files or any systematic problems fixed in a more timely fashion.

For a related discussion, see here.

Event Timeline

Restricted Application added subscribers: Steinsplitter, Aklapper. · View Herald Transcript

I guess ideally canRender() returning false would cause a tracking category.

I think canRender just returns true all the time because the media handlers report themselves as capable of rendering. Are you suggesting changing that to check the settings of an individual file for compatibility?

canRender only means: "is there a thumbnail engine for this filetype" if I remember correctly. The files that we are talking about here mostly have a render engine, it is when the render engine fails.

This is likely blocked on T60478: Improve interface for MediaHandlers to add JavaScript

It does take a $file parameter, so we could have it determine renderability based on an individual file I guess but indeed, returning false in general from canRender() is not an error condition -- spreadsheets and such can't currently be rendered to thumbnails, but we wouldn't want to add a 'Broken file' tracking category on every uploaded .ods.

Pdfs do adjust canRender() depending on if the file is corrupt.

In T132304#2201056, @brion wrote:

It does take a $file parameter, so we could have it determine renderability based on an individual file I guess but indeed, returning false in general from canRender() is not an error condition -- spreadsheets and such can't currently be rendered to thumbnails, but we wouldn't want to add a 'Broken file' tracking category on every uploaded .ods.

Yeah, you are right, a tracking category in that case does not make sense.

The thing to do may be to add a new handler method explicitly for checking correctness, and check that on page parse to add the tracking category. Existing handlers would then inherit the default implementation that says "everything's fiiiiine!", whereas those that know how to check their file details can use it. Could use the same interface in ImagePage to check validity and throw a user-visible warning that's more consistent.

Perhaps the same method should do a check and *also* return details, which would be ignored in the tracking category but used in the ImagePage display? Seems they'd usually go together in implementation...

Do we have a good pluggable interface for testing validity on upload as well? Might want to use the same one, or might want to test different things.

Since our thumbnailing is 'on demand' (404 handling), and not upon parse, I doubt if this is even doable at all.

Well there's two separate issues there:

  • if we try to thumbnail, and cannot, can we record that?
  • if we look at the file beforehand, can we determine that we know ahead of time we won't be able to render it?

This task covers the second case: we can look at an uploaded video file in TMH and determine "oh crap there's no way we'll be able to render this, it has track types we don't understand or we can't even extract any metadata in the first place".

The first case would be nice to capture as well, but is a separate issue.

Poyekhali awarded a token.
Poyekhali subscribed.
Aklapper raised the priority of this task from High to Needs Triage.Apr 30 2016, 5:02 PM
MarkTraceur moved this task from Untriaged to Triaged on the Multimedia board.
MarkTraceur subscribed.

@Aklapper obviously it isn't, because nothing has happened for several months now. I believe the broken thumbnail problems are limited to a fairly small number of files, and often those files have extenuating circumstances that lessen the severity of the issue (e.g. old versions of files).