Page MenuHomePhabricator

Extend capacity for video scalers
Closed, DuplicatePublic

Description

Seems we have quite a lot of broken videos on commons, and the EQIAD scalers can't keep up...

Can we enable both?

Event Timeline

Reedy created this task.Nov 4 2016, 10:48 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 4 2016, 10:48 PM
brion added a subscriber: brion.Nov 4 2016, 10:49 PM

I'm going to run major batch re-runs and have been waiting on there being more capacity, as well as fixing various bugs. More capacity would be great.

Revent added a subscriber: Revent.Nov 4 2016, 10:59 PM

A 'lot'... does a third of a damn million qualify? It appears to be well over twice the existing capacity, just sitting there burning electricity to no actual purpose.

MoritzMuehlenhoff renamed this task from Use codfw videoscalers to Extend capacity for video scalers.EditedNov 7 2016, 8:42 AM

Instead of repurposing the codfw scaler (we actually have only a single one) we should rather expand the capacity in eqiad. Also, both video scalers in eqiad are out of warranty for over two years now, so it makes sense to also refresh these along.

Reedy added a comment.Nov 7 2016, 1:17 PM

Sounds like a good plan. I guess a newer CPU generation or two is going to provide some reasonable gains to begin with

Matanya added a subscriber: Matanya.Nov 7 2016, 7:50 PM

probably adding a GPU might be wise as well.

Revent added a comment.EditedNov 13 2016, 1:21 PM

Ok. As it now stands, all of the broken transcodes 'exposed' by TimedMediaHandler on Commons (I mean, the list at https://commons.wikimedia.org/wiki/Special:TimedMediaHandler ) are either...
A. Subjects of a bug that prevents 'any' successful transcode.

B. Subjects of a bug that prevents transcoding from OGV to WebM.

or C. Very long (over an hour) and large (from ~750k up to about 3G) that have simply failed to transcode after repeated attempts, even when run 'one at a time'. These often error out after 5-6 hours.

This last category seems to simply reflect the subject of this bug.... that the video scalers are insufficently powerful to handle the works that we are receiving. However, it also highlights (and the fact that 350,000+ transcodes are broken emphasizes) a different point, one probably addressed by a simple fix.... that TimedMediaHandler should expose more of the list of broken transcodes (it shows about 50) so that is list is less likely to become completely clogged with entries for files that are impossible to push through the queue.

As it stands now, it's not sensible to try to continue to push broken transcodes through the queue, as every single one demonstrably WILL NOT SUCCEED. Only 'new' uploads can be transcoded.

Just as an 'update', after being educated on how to use Quarry to look at older parts of the broken transcode list, I've been working on poking ones through the queue again. The ones it's showing me are from June of 2013... many are 'short' files, that complete successfully within less than a minute once kicked back in the queue. I've pushed several hundred (I did not keep exact track) back through already.

fgiunchedi triaged this task as Normal priority.Nov 29 2016, 11:45 PM