Now working with the patch merged.
Note there is no logs/mediawiki-runJobs.log file, and I cannot connect to port 80 on 127.0.0.1 from within the VM (testing curl http://127.0.0.1/w/RunJobs.php).
Running with --debug seems to indicate that Vagrant's downloader is failing to load curl or some such junk:
Mon, Feb 20
Workaround: should work to download the file and play it in VLC for Android. Awkward, but that's file formats for ya. :)
Chrome for Android doesn't support Ogg Theora video natively: http://caniuse.com/#search=ogv
Sat, Feb 18
Fri, Feb 17
The chief problem with ffmpeg.js is that it has the h.264 and AAC decoders built in and shipped to the client -- like in firefogg's embedded copy of ffmpeg -- making it a legal minefield.
Thu, Feb 16
Wed, Feb 15
Queue runner count has now been pushed up, so we should get through the rest of the files faster. Might still bump it slightly farther up, I'm seeing usage around 80%, though we're trying to avoid overcommitting CPU either.
I've also opened a more general tech-debt task T158181 to collect subtasks where we have different or limited functionality between mobile and desktop web. We may not reach 1:1 parity but I'd love to try. :)
@Jdlrobson agreed, I apologize for my hasty snarky comment earlier. :)
There was talk in the past about decoupling raw filenames from titles in the file: namespace so that extensions aren't exposed on new uploads, titles can contain funky chars, and file types can be upgraded without breaking usage.
Aha, that's a rather confusing UI to me; I would expect to have a separate button or link that says "ignore conflict" rather than hiding it in a drop-down list of timezones I must search through and then having to click something that says "Change Timezone" when I want to *not* change timezone. :)
Anyway, main thing to consider is that the targets system was meant to be temporary, as I recall, during the period when we had massive amounts of stuff that wasn't tested on mobile. If people still aren't testing things on mobile, then we got a problem that needs fixing before we remove it entirely. :)
(Or I suppose you can keep two skins and force everyone to ACTUALLY test everything on both.)
So the correct fix for this is to remove the "mobile site" mode and extra skin and have a single, responsive output mode and skin, with all modules working on that skin on both large and small screens. This is a fairly big task.
I'm mostly free next week (other than the archcom meetings Wednesday), happy to start getting this issue back in gear. :)
Tue, Feb 14
The 160p/240p.ogv 'missing' jobs are getting closer to done; going ahead and adding the 360p/480p.ogv 'all' jobs back in to the mix.
The three surfaced images appear to be shown backwards from the top 3 results, resulting in the 3rd result being shown largest and the top result being shown smallest:
Looks nice so far! However the multimedia results seem to be frequently irrelevant, perhaps a result of searching based primarily on title... Titles of images are often not descriptive, are in other languages, or contain irrelevant details that are sorted/scored higher than relevant images that appear in articles.
Mon, Feb 13
@Jdforrester-WMF do you know who we should ping re: the structured data project plans and if we can/should get someone to the IIIF conference in June that Liam mentioned above? I'm in on the IIIF's A/V API working group but may not be the best point person for other data-sharing kinds of things: there's kind of middling support for structured metadata in the protocols so far (just a few vaguely-specified fields for attribution, licensing etc) which might want to be expanded, and that'd be something I'd rather leave to folks planning to work in the thick of the metadata. :)
Sat, Feb 11
Stopping the 360p/480p ogv jobs for now, need to bump up the queue runners (per above patch) to actually use the extra CPU. :)
There's some spare CPU still, so adding in the 360p.ogv and 480p.ogv re-run jobs, seeing how it goes... (still throttled; currently the high-prio queue is about 5 deep and the low-prio queue is at ~9000, down from ~9800 yesterday)
Fri, Feb 10
Currently running throttled version of requeueTranscodes.php to fill in missing 160p.ogv, 160p.webm, 240p.ogv, and 240p.webm transcodes.
Note per IRC conversation -- will also need to do a generic cleanup pass on non-Commons wikis that have local audio or video uploads. This shouldn't eat too much human time, just needs to be set up and looped.
I think I'm going to wait on the rest until I've put an automatic throttle in place... too easy to flood the high-prio queue with requeueTranscodes.php otherwise.
I added a bunch of 160p.ogv transcodes and am waiting to see how long it takes for them to run through. :)
- ogg audio (missing)
Thu, Feb 9
Ok I'm near ready to run this, am testing with audio transcodes before I do anything rash. :)
I'm pretty sure we resolved this on the ops/hhvm end. Closing out.
MW changes are live. Existing files will continue to process on the low-prio queue. I'll watch the server load for the next couple days and see if we need to change the queue runner balance between the two.
Seems happier now; quick fix deployed.
Ok, the puppet changes went out at this morning's puppet swat, mediawiki side is ready to go for the 11am pacific mediawiki swat.
Wed, Feb 8
It's only loading details for 50 items -- I think the problem is that it's also loading the error output (and then not using it) and some of those are MONSTROUSLY huge, adding up to eating 500 mb of process ram just for the errors:
Still fails on .10, I don't think there's any relevant change to the special page in the last version.
hmm nevermind, should be limiting to 50. Not sure wtf is going on then, looking more closely
Special:TimedMediaHandler's getTranscodes method asks for every matching row instead of a subset. It's probably just too many files total now. I'll see if I can add a quick hack to limit.
So there's two separate questions:
- can we run an h.264 decoder (via ffmpeg) on a service in production that converts from files we don't serve directly
Build 15031 fixes the immediate problem but still introduces some other incompatibility... however it seems to be worked around by the updated streaming code in ogv.js master. I'll prep an update release...
Tue, Feb 7
Yeah, this is a useful ability and will be needed for panoramic stuff. I'd like to create an IIIF image api & tiling endpoint built into MediaWiki instead of as a separate layer, but we haven't got to it yet..
Thu, Feb 2
Thu, Jan 26
Hmm, maybe let's bump the limit up a little further. No sense spending hours transcoding just to fail the file. (Dialing this in is not an exact science; the wall clock and CPU times don't match up on a consistent ratio so you'll see some failures earlier, and others later.)
Jan 19 2017
Ok, this is merged live in today's SWAT updates. Already-running jobs will still have the lower limit and may still time out, but those that start from now should have a doubled time limit which'll be more in line with wall-clock time and should avoid timing out on most of the 1-2 hours 720p/1080p videos.
(patch in the works to double the timeout based on our threading setting)
I was a bit baffled why the timeouts seem to be happening significantly before the 8-hour limit is hit, but it turns out ulimit is based on *CPU time* not *wall-clock time*. Since there is some parallelization between decode, scaling, and re-encoding, the CPU usage is around 175% on these ffmpeg processes, not a 'mere' 100%, so we'll hit an 8 hour limit in 4-6 hours.
Memory usage looks sane enough logging in... I do see a few processes in 240+ minutes of CPU time, longest running now is 346 minutes.
Ah that 'memory' is just the memory limit being reported. That doesn't help us know why it died. Sigh.
(Ok I can find the error messages, it's just not linking to the popup anymore. Weird.) Yeah, I see those "Exitcode: 137" which indicates a SIGKILL... but the times don't seem to match up with a strict 8-hour limit. I see some apparently killed at 6 hours or so, others less. Actually I think they're dying on the memory limit -- "Exitcode: 137 / Memory: 4194304" etc.
@Yann can you break out a separate task for raising the background time limit? I want to make sure we don't lose that. Currently most of the failures I'm seeing aren't reporting back what their errors were, which makes it tricky to see what's going on, I'll have to poke around and see why that's going on.
Jan 18 2017
This appears to be an issue with File_Ogg. Tagging upstream... https://pear.php.net/bugs/bug.php?id=21164
Skeleton track, if present, can list the types of the various streams, but yes it's usually not there for audio and not always there for video depending on which software was used to convert. But parsing the skeleton track isn't trivial either -- it's embedded in the Ogg stream multiplexing too -- so we may as well just check the header packets of each stream, which is what File_Ogg does.
Work in progress retooling some of the old code to be namespace & PSR-4 autoloader friendly on my work fork branch: https://github.com/brion/File_Ogg/commits/modernize Also merged the fix for certain cut-off positions, and removed the PEAR_Exception dependency.
Jan 17 2017
Adding T103421 as dep; proper way to check is to actually look at the header packets and check what they contain, which File_Ogg package can do. We need to move our copy out of TMH and use it via Composer in a shared manner.
I'll take this on, was looking at making some fixes anyway. @tstarling can you set me up as a maintainer for Pear/File_Ogg and I'll try and get it synced with our inline version and set up with Composer?
Workaround: run via 'php5' manually:
We should be able to use a suitable default size here for non-JS path, with suitable JS injection if we really want to give a precise size.
You'll probably have to touch the affected files to update their data first... Not sure if we have a suitable maint script at this time.
Ok I've confirmed the backend image servers are now producing thumbnail output. Thanks joe! Closing back out as resolved.
Jan 14 2017
Yeah I think we're good to close this one out; improvements to the queue handling are in a separate ticket