Certain videos on Commons do not start at all or have interruptions
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Verena
	Dec 27 2016, 3:08 PM

Description

We do have a somewhat urgent situation here:

On January 1st a campaign is going to start in DE-WP, based on videos which we uploaded in Commons. We are expecting at least 1.000 page views of the Wiki-page which includes the videos.

It seems that there are some performance problems with playing the videos (interruptions or not starting at all). They are all webm files and should have been converted properly. I am talking about all video files in the category: https://commons.wikimedia.org/wiki/Category:Neue_Ehrenamtliche

This task is very likely linked with T153488: Commons video transcoders have over 6500 tasks in the backlog.

Does anyone have an idea on this? We want to avoid to use another platform than Commons.

Related Objects
Search...

Status	Assigned	Task
Duplicate	None	T153357 Please upload large file to Wikimedia Commons
Duplicate	None	T153137 Please upload large file to Wikimedia Commons
Duplicate	None	T153136 Please upload large file to Wikimedia Commons
Duplicate	None	T152969 Please upload large file to Wikimedia Commons
Duplicate	None	T152943 Please upload large file to Wikimedia Commons
Resolved	Dereckson	T152938 Server side upload for Jasonanaggie
Resolved	Dereckson	T153572 Please upload large file to Wikimedia Commons
Resolved	Dereckson	T153809 Server side upload for Jasonanaggie
Resolved	Dereckson	T153931 Please upload large file to Wikimedia Commons
Declined	Dereckson	T152942 Please upload large file to Wikimedia Commons
Resolved	Dereckson	T154101 Server side upload for Jasonanaggie
Resolved	Dereckson	T154102 Server side upload for Sporti
Resolved	hoo	T154186 Certain videos on Commons do not start at all or have interruptions
Resolved	Revent	T153488 Commons video transcoders have over 6500 tasks in the backlog.
Duplicate	None	T114337 Assign 3 more servers to video scaler duty
Duplicate	None	T150067 Extend capacity for video scalers
Declined	Dereckson	T154733 Temporarily unassign 'transcode-reset' user right from Commons autoconfirmed and sysop groups
Resolved	matmarex	T154737 Add entries to Special:Log regarding use of transcode resets
Resolved	• brooke	T155098 Rework job queue usage for TimedMediaHandler (video scalers)
Declined	None	T154315 Videos sometimes loaded twice in Firefox when played with the timed media handler player

Event Timeline

Verena created this task.Dec 27 2016, 3:08 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptDec 27 2016, 3:08 PM

zhuyifei1999 added a subtask: T153488: Commons video transcoders have over 6500 tasks in the backlog..Dec 27 2016, 4:32 PM

zhuyifei1999 mentioned this in T153488: Commons video transcoders have over 6500 tasks in the backlog..Dec 27 2016, 4:35 PM

zhuyifei1999 updated the task description. (Show Details)Dec 27 2016, 4:39 PM

Yann subscribed.Dec 27 2016, 9:41 PM

What makes you think the two problems are related?

It was mentioned by another Wikipedian in another discussion. I am no expert but it sounds like a plausible reason for the performance problem. Do you have other ideas?

@Verena transcoding is an async process, so a video has either been transcoded or not; in terms of user-facing playback, having a transcode queued is not really affecting playback.

What can be related is if the default video player by default uses a smaller-resolution transcode and it's slow because of the high bandwidth needed for playing the video.

The playback at the original resolution shouldn't be affected in any ways by the other issue, that's why I was asking a sincere question :)

Aklapper renamed this task from Commons videos: performance problems to Certain videos on Commons do not start at all or have interruptions.Dec 28 2016, 11:56 AM

Unrelated: I see some HTTP 500 thumbnail creation issues for e.g. Machmit_Intro_Videos.webm in the "network" tab of the "Developer Tools" of my web browser (for more info how to find such problems, please see: Firefox ≥24; Internet Explorer; Google Chrome; Apple Safari).

Longest videos seem to be https://commons.wikimedia.org/wiki/File:Tutorial_04_Artikel_verbessern.webm and https://commons.wikimedia.org/wiki/File:ADA-Video_Mach_mit_bei_Wikipedia.webm . I'd expect an HTTP 206 (Partial content) in the network tab.

Yann unsubscribed.Dec 28 2016, 4:42 PM

Verena assigned this task to hoo.Dec 29 2016, 11:24 AM

I just looked into this, and for me in Firefox 50, (some of) the videos only play once their entirety has been downloaded twice(!).

Even when downloading with about 80-100Mbps, this means that we need to wait over a minute for some videos to properly start.

Thanks for the reply. Is that an issue of Commons or an issue of the video file itself?

In T154186#2906311, @Verena wrote:

Thanks for the reply. Is that an issue of Commons or an issue of the video file itself?

I think we have two issues here: The player seems to be very inefficient (why is it loading the files twice?), also we don't have any reasonably sized transcodes available, so people have to load a huge chunks of data (up to 500MiB) before some videos even start playing.

The videos not starting instantaneous could be due to some files not being streamable (although they seem streamable in VLC)?

The videos in question are in the transcode queue, but the transcoders will at least take another couple of days to reach them. In theory we could probably kick these jobs of by hand on an idle misc server (do they have the dependencies for video transcodes?), but I'm not sure that's desirable.

hoo added subscribers: MarkTraceur, matmarex.Dec 29 2016, 1:58 PM

1: If you browser doesn't support webm, the films might not start, because there is no ogg transcode available yet.
2: If you have a slow internet connection, then you might experience stuttering, because lower resolution transcodes are not available yet.
3: If the file was not properly prepared (and you are playing the original because of 1 or 2), it might be downloaded in full or otherwise inefficiently used.
4: If there is a network problem that strips or otherwise causes loss Range headers or range capabilities, then you will also have a problem (and this could also be inside our backend landscape, because i think we do quite a bit of magic with swift and varnish).

I'd try to use Youtube whenever possible, if you need any guarantees.

The videos not starting instantaneous could be due to some files not being streamable (although they seem streamable in VLC)?

VLC is smart at recovery. It does many things that most simpler video playback engines won't do.

That sounds sensible, thanks @TheDJ. Shall we open a new bug about the player loading the files twice?

Shall we open a new bug about the player loading the files twice?

Yes, that seems sensible. I have not been able to reproduce that problem btw. For me it all downloads immediately in 1.5MB chunks when I use FF.

And since it's native playback, you should probably see the same behavior if you open the original http url of the file in your browser standalone.

Oh and I just noticed https://commons.wikimedia.org/wiki/File:Machmit_Intro_Videos.webm claiming to be 41hours long (also in VLC). That indicates a significant problem with the original file.

With something like the following, we could manually run the transcodes in question on one of the video scalers:

<?php
$jobQueueGroup = JobQueueGroup::singleton();
$jobQueue = $jobQueueGroup->get( 'webVideoTranscode' );
 
$jobs = $jobQueue->getAllQueuedJobs();
$job = null;
foreach ( $jobs as $cJob ) {
        if ( $cJob instanceof WebVideoTranscodeJob && preg_match( '/^File:Tutorial_\d/', $cJob->getTitle()->getFullText() ) ) {
                $job = $cJob;
                break;
        }
}
if ( $job === null ) {
        die( 'Nothing found, seems we are done!' );
}
 
echo 'Chose: ' . $job . "\n\n\n";
 
$success = $job->run();
if ( $success ) {
        $jobQueueGroup->ack( $job );
        echo "Success.";
} else {
        echo "Something went wrong :(";
}
 
echo "\n\n\n";

@aaron @Reedy Do you see a better way to go about this?

hoo created subtask T154315: Videos sometimes loaded twice in Firefox when played with the timed media handler player.Dec 30 2016, 11:18 AM

@hoo FYI, due to a recent operations issue, all of the backlogged video transcodes were booted out of the queue, and have to be restarted (at a sane rate, ofc)... I'm booting your videos back through first.

• brooke added a subtask: T155098: Rework job queue usage for TimedMediaHandler (video scalers).Jan 11 2017, 5:11 PM

Luke081515 subscribed.Jan 12 2017, 4:48 PM

• brooke closed subtask T153488: Commons video transcoders have over 6500 tasks in the backlog. as Resolved.Jan 14 2017, 10:55 PM

Is this still a problem? I tried a few of the videos and they played properly. Was this just caused by missing transcodes?

Actually not. The campaign is over and all videos are transcoded. Unfortunately the transcoding process did not happen in time for the campaign. In addition there where some problems with the videos in Safari and Internet Explorer, but this seems to be an general Commons problem.

There is an issue that occurs sometimes (such as when the servers are restarted) where transcodes end up with both a 'success' and a 'failure' status in the SQL... they are shown on the file pages as successfully transcoded, but it seems that, in general, they were not.

The transcodes affected by this show up in https://quarry.wmflabs.org/query/14916 (as well as exactly 35 transcodes - from 2013 - that are not due to this, and are 'orphaned' because of files that were renamed, the bug that created those seems to be long fixed)

Right now there are not 'any' listed here, because I have run them back through, over time. I'm reasonably sure that this was the cause of the videos that would not play successfully... that the transcode was somehow 'aborted', but the partially transcoded video was treated as successful.

If this occurs again (hopefully, in smaller numbers) I'll preserve them, and open an actual bug, but in the case of hundreds (mainly new uploads) it seemed important to simply get them rerun.

Revent mentioned this in T138967: Labs database replica drift.Jan 25 2017, 1:54 AM

@Revent, i really wish you had saved the sql output before rerunning everything. At least a few rows. The logic of determining if there was a failure is quite intricate :(

So you say that they have a 'success' timestamp, but also an error timestamp, an error, or they are generally just don't have a proper result file ?

@TheDJ The search checks for the existence of 'not null' in transcode_time_success and transcode_time error. The ones that were originally in the report, other than the 35 there now, were all from 'years ago', and the ones I checked at the time were rather 'obviously' deleted shortly after being uploaded. As it stands now, if the file is deleted or renamed shortly after upload, the entries in the transcode table go away, so that seems to have been long ago fixed.

When the servers were restarted in December (the 19th I think), what appeared to be all the 'overloaded' tasks started due to the apache bug appeared in that report... it was, iirc, about 1200.

When the servers were restarted again this month, on the 11th or 12th, about 450 tasks (what appeared to be the entire content of the queue at that time) were dumped into that report.

What I 'have' noticed is that if a long-running transcode is reset 'while it is running' it ends up in that report.... IIRC, with an error that indicates that the working file went away. If you want, I can probably quite easily cause it to happen again a few times.... we have plenty of big files that need to be run.

MarkTraceur unsubscribed.Jan 27 2017, 10:36 PM

• brooke closed subtask T155098: Rework job queue usage for TimedMediaHandler (video scalers) as Resolved.Feb 9 2017, 8:06 PM

Closing this out as the backlog issue is over; there may still be general issues with transcodes that get reset at weird times, so people open a new specific bug if can reproduce. (Will be doing more maintenance on the transcode queue and how it's handled later this spring, probably.)

Aklapper removed a project: TimedMediaHandler.Nov 7 2019, 4:02 PM

Jdforrester-WMF closed subtask T154315: Videos sometimes loaded twice in Firefox when played with the timed media handler player as Declined.May 25 2022, 12:55 PM

Certain videos on Commons do not start at all or have interruptionsClosed, ResolvedPublicActions

Description

Related ObjectsSearch...

Event Timeline

Certain videos on Commons do not start at all or have interruptions
Closed, ResolvedPublic
Actions

Related Objects
Search...