Page MenuHomePhabricator

Transcoding hangs for no apparent reason, then timeouts
Closed, ResolvedPublic

Event Timeline

Yann created this task.Jan 13 2017, 2:03 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 13 2017, 2:03 PM

Production SELECT * FROM commonswiki_p.transcode WHERE transcode_image_name = "Operating-Mechanism-and-Molecular-Dynamics-of-Pheromone-Binding-Protein-ASP1-as-Influenced-by-pH-pone.0110565.s001.ogv";:

1'/usr/bin/ffmpeg2theora' '/tmp/localcopy_452735dec48b.ogv' -V '160' -F '15' -a '-1' -c '2' --no-upscaling --optimize --keyint '128' --buf-delay '256' --width '186' --height '160' --aspect '186:160' -o '/tmp/transcode_160p.ogv2530de39ae0d.ogv'
2
3Exitcode: 124
4Memory: 4194304
5
6[theora @ 0x10d00c0] 7 bits left in packet 82
7[ogg @ 0x10ceee0] Broken file, keyframe not correctly marked.
8Input #0, ogg, from '/tmp/localcopy_452735dec48b.ogv':
9 Duration: 00:02:12.53, start: 0.000000, bitrate: 4374 kb/s
10 Stream #0:0: Video: theora, yuv420p, 660x568 [SAR 1:1 DAR 165:142], 30 tbr, 30 tbn, 30 tbc
11 Metadata:
12 ALBUM : Operating Mechanism and Molecular Dynamics of Pheromone-Binding Protein ASP1 as Influenced by pH
13 ARTIST : Han L, Zhang Y, Zhang L, Cui X, Yu J, Zhang Z, Liu M
14 COPYRIGHTS : Han et al
15 LICENSE : http://creativecommons.org/licenses/by/4.0/
16 comment : The full trajectories of 200 ns MD simulation of holo ASP1 under pH 4.5 and pH 7.0 conditions, respectively.
17 DATE : 2014
18[theora @ 0x10d00c0] 7 bits left in packet 82
19 Pixel Aspect Ratio: 1.00/1 Frame Aspect Ratio: 1.16/1
20 Resize: 660x568 => 186x160
21 Resample Framerate: 30.000 => 15.000
22[swscaler @ 0x13935c0] Warning: data is not aligned! This can lead to a speedloss
23 0:00:02.13 audio: 0kbps video: 172kbps, time elapsed: 00:00:00 [ogg @ 0x10ceee0] Broken file, keyframe not correctly marked.
24 0:00:04.26 audio: 0kbps video: 164kbps, time elapsed: 00:00:00 [ogg @ 0x10ceee0] Broken file, keyframe not correctly marked.
25 0:00:06.40 audio: 0kbps video: 162kbps, time elapsed: 00:00:00 [ogg @ 0x10ceee0] Broken file, keyframe not correctly marked.
26 0:00:08.53 audio: 0kbps video: 161kbps, ET: 00:00:14, est. size: 2.5 MB [ogg @ 0x10ceee0] Broken file, keyframe not correctly marked.
27 0:00:10.66 audio: 0kbps video: 162kbps, ET: 00:00:11, est. size: 2.6 MB [ogg @ 0x10ceee0] Broken file, keyframe not correctly marked.
28 0:00:12.80 audio: 0kbps video: 161kbps, ET: 00:00:09, est. size: 2.5 MB [ogg @ 0x10ceee0] Broken file, keyframe not correctly marked.
29 0:00:14.93 audio: 0kbps video: 161kbps, ET: 00:00:07, est. size: 2.5 MB [ogg @ 0x10ceee0] Broken file, keyframe not correctly marked.
30 0:00:17.06 audio: 0kbps video: 161kbps, ET: 00:00:13, est. size: 2.5 MB [ogg @ 0x10ceee0] Broken file, keyframe not correctly marked.
31 0:00:19.20 audio: 0kbps video: 161kbps, ET: 00:00:11, est. size: 2.5 MB [ogg @ 0x10ceee0] Broken file, keyframe not correctly marked.
32 0:00:20.26 audio: 0kbps video: 161kbps, ET: 00:00:11, est. size: 2.5 MB [ogg @ 0x10ceee0] DTS 617 < 619 out of order
33[ogg @ 0x10ceee0] Broken file, keyframe not correctly marked.
34 0:00:21.86 audio: 0kbps video: 160kbps, ET: 00:00:10, est. size: 2.5 MB [ogg @ 0x10ceee0] Broken file, keyframe not correctly marked.

Exitcode: 124 is a timeout error.

I went ahead and reproduced this on labs:

1zhuyifei1999@encoding01:~$ '/usr/bin/ffmpeg2theora' 'Operating-Mechanism-and-Molecular-Dynamics-of-Pheromone-Binding-Protein-ASP1-as-Influenced-by-pH-pone.0110565.s001.ogv' -V '160' -F '15' -a '-1' -c '2' --no-upscaling --optimize --keyint '128' --buf-delay '256' --width '186' --height '160' --aspect '186:160' -o '/dev/null'
2[theora @ 0x1925180] 7 bits left in packet 82
3[ogg @ 0x1924100] Broken file, keyframe not correctly marked.
4Input #0, ogg, from 'Operating-Mechanism-and-Molecular-Dynamics-of-Pheromone-Binding-Protein-ASP1-as-Influenced-by-pH-pone.0110565.s001.ogv':
5 Duration: 00:02:12.53, start: 0.000000, bitrate: 4374 kb/s
6 Stream #0:0: Video: theora, yuv420p, 660x568 [SAR 1:1 DAR 165:142], 30 tbr, 30 tbn, 30 tbc
7 Metadata:
8 ALBUM : Operating Mechanism and Molecular Dynamics of Pheromone-Binding Protein ASP1 as Influenced by pH
9 ARTIST : Han L, Zhang Y, Zhang L, Cui X, Yu J, Zhang Z, Liu M
10 COPYRIGHTS : Han et al
11 LICENSE : http://creativecommons.org/licenses/by/4.0/
12 comment : The full trajectories of 200 ns MD simulation of holo ASP1 under pH 4.5 and pH 7.0 conditions, respectively.
13 DATE : 2014
14[theora @ 0x1925180] 7 bits left in packet 82
15 Pixel Aspect Ratio: 1.00/1 Frame Aspect Ratio: 1.16/1
16 Resize: 660x568 => 186x160
17 Resample Framerate: 30.000 => 15.000
18[swscaler @ 0x1bece60] Warning: data is not aligned! This can lead to a speedloss
19 0:00:02.13 audio: 0kbps video: 176kbps, ET: 00:01:01, est. size: 2.8 MB [ogg @ 0x1924100] Broken file, keyframe not correctly marked.
20 0:00:04.26 audio: 0kbps video: 165kbps, ET: 00:00:30, est. size: 2.6 MB [ogg @ 0x1924100] Broken file, keyframe not correctly marked.
21 0:00:06.40 audio: 0kbps video: 163kbps, ET: 00:00:19, est. size: 2.6 MB [ogg @ 0x1924100] Broken file, keyframe not correctly marked.
22 0:00:08.53 audio: 0kbps video: 162kbps, ET: 00:00:14, est. size: 2.6 MB [ogg @ 0x1924100] Broken file, keyframe not correctly marked.
23 0:00:10.66 audio: 0kbps video: 162kbps, ET: 00:00:11, est. size: 2.6 MB [ogg @ 0x1924100] Broken file, keyframe not correctly marked.
24 0:00:12.80 audio: 0kbps video: 161kbps, ET: 00:00:09, est. size: 2.5 MB [ogg @ 0x1924100] Broken file, keyframe not correctly marked.
25 0:00:14.93 audio: 0kbps video: 161kbps, ET: 00:00:15, est. size: 2.5 MB [ogg @ 0x1924100] Broken file, keyframe not correctly marked.
26 0:00:17.06 audio: 0kbps video: 161kbps, ET: 00:00:13, est. size: 2.5 MB [ogg @ 0x1924100] Broken file, keyframe not correctly marked.
27 0:00:19.20 audio: 0kbps video: 161kbps, ET: 00:00:11, est. size: 2.5 MB [ogg @ 0x1924100] Broken file, keyframe not correctly marked.
28 0:00:20.26 audio: 0kbps video: 161kbps, ET: 00:00:11, est. size: 2.5 MB [ogg @ 0x1924100] DTS 617 < 619 out of order
29[ogg @ 0x1924100] Broken file, keyframe not correctly marked.
30 0:00:21.86 audio: 0kbps video: 160kbps, ET: 00:00:10, est. size: 2.5 MB [ogg @ 0x1924100] Broken file, keyframe not correctly marked.
31^C

ffmpeg2theora hangs until I gave it a SIGINT. Also noting that there is about 100% CPU utilization by this process.

Naturally I reran the process with strace to see if the process is actually doing any IO during the hanging, and no:

1write(5, "\314`\365d\350\20\204\326\336\310\36~\215=-\345$\352\263I\254\320`\367\362\305\375\373\236B\355\215"..., 4096) = 4096
2read(4, "\210\3521`\32\374\305S\"^\220c\377\2>\231\315\224\2612\213^\f\225\331\\>@\256/\274\336"..., 57100) = 57100
3read(4, "OggS\0\1\32\234\0\0\0\0\0\0cP\25\27\302\0\0\0\346\314\220\351\363\377\377\377\377\377"..., 32768) = 32768
4read(4, "\37\27\203\240\1\236\371\342\312\343\320n\4\307\250\240\263f\301Q9\5\355\1\256)7M\247\241\351\r"..., 32768) = 32768
5write(5, "&\373\304:\275\210\313\351\274\345E,\322\376\t\245\201\312\t\320\363\340Q\206\374\310\10\2408\216\24\337"..., 4096) = 4096
6 0:00:21.60 audio: 0kbps video: 160kbps, ET: 00:00:10, est. size: 2.5 MB ) = 77
7read(4, "\35\353M\25'\317C]g!S\306\202#j\326z\264nj\215\271x\326q\302\372\7Dn\2567"..., 60822) = 60822
8read(4, "OggS\0\1!\234\0\0\0\0\0\0cP\25\27\304\0\0\0\35\343\354\5\350\377\377\377\377\377"..., 32768) = 32768
9read(4, "8\25<\201&\5\21\275\3748\30\3419\3\34R\370\274!qE\207E9\16\322OA!\24}\21"..., 32768) = 32768
10write(5, "]7\227\230\373A\344c\230\10X\264+\256\350\226%G\330\267yQ\324>\247\t\343\312\330\233\251x"..., 4096) = 4096
11read(4, "\343`\341N,\337\0267n\260\353\375\303\nO\23\327\5\254\262\32\353\310\303\227\323\344L\177\33\177\312"..., 58432) = 58432
12read(4, "OggS\0\1(\234\0\0\0\0\0\0cP\25\27\306\0\0\0\341F3)\342\377\377\377\377\377"..., 32768) = 32768
13read(4, "\362Ctk\f\30\353\371\321^\4\340\201\361/\322\342\36Q#\272@\216\r\30\2609\255\212\363\254\302"..., 32768) = 32768
14write(5, "\17\324\26\"\300\0374^H\275\254\223jV\203\230\333\232\230g@\225\314j\221 \310\210k\336S`"..., 4096) = 4096
15write(5, "\16\212\222S\231u4SL\322\251\t% \227%\3.e\22\234\222\241Ph6\21\370\20=\306#"..., 4096) = 4096
16 0:00:22.13 audio: 0kbps video: 160kbps, ET: 00:00:09, est. size: 2.5 MB ) = 77
17read(4, "\211!c(\300\210D\274q\"\211\0{-\234\364>\32\217q\206\367J\324I\310Z\310\217\324Y\266"..., 56629) = 56629
18read(4, "OggS\0\1,\234\0\0\0\0\0\0cP\25\27\311\0\0\0\370d\232\231&\377\377\377\377\377"..., 32768) = 32768
19read(4, "\245\26|`\346\240\321\322O\220\346\320L'\357\260\303\216\250q\220\303c\275|\370\214.6\27\334\372"..., 41987) = 41987
20write(2, "\33[48;5;0m\33[38;5;207m[ogg @ 0x191"..., 42[ogg @ 0x1914100] ) = 42
21write(2, "\33[48;5;0m\33[38;5;226mBroken file,"..., 68Broken file, keyframe not correctly marked.
22) = 68
23brk(0x21f5000) = 0x21f5000
24brk(0x2260000) = 0x2260000
25brk(0x2338000) = 0x2338000
26^C--- SIGINT {si_signo=SIGINT, si_code=SI_KERNEL} ---
27Process 12820 detached

Also reproduced locally. I'm not sure how to set a debug loglevel in order to debug it.

zhuyifei1999 renamed this task from Trancoding fails for no apparent reason to Trancoding hangs for no apparent reason, then timeouts.Jan 13 2017, 3:07 PM
brion added a subscriber: brion.Jan 13 2017, 3:53 PM
brion claimed this task.Jan 13 2017, 4:00 PM

Yeah looks like it may be a looping bug in ffmpeg2theora, I'll try and debug.

brion added a comment.Jan 13 2017, 5:18 PM

hehehhe "2147483647 duplicate frames added to maintain sync" *cough* yeah there's a bug in here

brion added a comment.Jan 13 2017, 5:44 PM

The source file has a packet that ffmpeg reports a wildly incorrect decode timestamp for; this completely throws off ffmpeg2theora.

ffprobe -show_packets Operating-Mechanism-and-Molecular-Dynamics-of-Pheromone-Binding-Protein-ASP1-as-Influenced-by-pH-pone.0110565.s001.ogv  | grep dts
....
dts_time=87.733333
[ogg @ 0x7fdad1000000] Broken file, keyframe not correctly marked.
dts=2633
dts_time=87.766667
dts=288230376151711804
dts_time=9607679205057060.000000
dts=288230376151711805
dts_time=9607679205057060.000000
dts=2636
dts_time=87.866667

Looking at the actual ogg packets, something's definitely wrong looking:

oggz-dump Operating-Mechanism-and-Molecular-Dynamics-of-Pheromone-Binding-Protein-ASP1-as-Influenced-by-pH-pone.0110565.s001.ogv  | grep 'packetno 26'
...
00:01:27.733: serialno 0387272803, granulepos 2604|29, packetno 2637: 17.977 kB
00:01:27.766: serialno 0387272803, calc. gpos 2634|0, packetno 2638: 37.686 kB
00:01:27.800: serialno 0387272803, granulepos 2634|1 ERR: file gp -1|62, packetno 2639: 16.588 kB
00:01:27.833: serialno 0387272803, calc. gpos 2634|2, packetno 2640: 19.027 kB
00:01:27.866: serialno 0387272803, calc. gpos 2634|3, packetno 2641: 15.764 kB

Looks like there's a bogus granulepos on one of the packets, which ffmpeg/libav fouls up with and interprets as a legit but totally out of whack timestamp. ffmpeg2theora then explodes trying to adjust a/v sync for the bogus data.

I can probably remux the file and fix it as a on-off, but might be wise to submit some fix patches.

brion added a comment.Jan 13 2017, 5:57 PM

I recompressed the .ogv file from the original source .mpg and uploaded a new copy, which appears to be transcoding correctly.

ffmpeg2theora's behavior is sensible enough assuming ffmpeg/libav doesn't foul up, should see whether it's feasible to get libav to handle this case better upstream.

Upstream report https://trac.ffmpeg.org/ticket/6080#ticket

Patch submitted via ffmpeg-devel list, isn't in archives yet (presumably I'm held for moderation as a new joiner).

brion moved this task from Backlog to Reported Upstream on the Upstream board.Jan 13 2017, 10:12 PM

@brion hi, should we back port your fix into wmf? Or should wmf update all the way to 3.3 with your fix?

Aklapper renamed this task from Trancoding hangs for no apparent reason, then timeouts to Transcoding hangs for no apparent reason, then timeouts.Jan 15 2017, 4:09 PM
brion closed this task as Resolved.Jan 9 2018, 9:37 PM

No longer using ffmpeg2theora or creating .ogv output; WebM transcodes are unaffected by the bug. Closing out.