Page MenuHomePhabricator

Intermittent bandwidth issue to labs proxy (eqiad) from Comcast in Portland OR
Closed, ResolvedPublic

Description

I'm not sure if this is a general issue or not, but Yuvi suggested filing a bug just in case. :)

I'm testing some new media transcoding configurations, setting up files on https://media-streaming.wmflabs.org/ The transcoded output is currently in straight downloadable files which can get quite large, up to about a gigabyte for the longest video I'm working with:

http://media-streaming.wmflabs.org/transcoded/1/10/Tears_of_Steel_in_4k_-_Official_Blender_Foundation_release.webm/Tears_of_Steel_in_4k_-_Official_Blender_Foundation_release.webm.2160p.ogv (1.2G)

However I'm intermittently seeing much lower download bandwidth than I'm expecting -- hitting as low as 1Mbps -- which can affect streaming playback even of the lower resolution files.

Between the instance (media-streaming.ogvjs-integration.eqiad.wmflabs) and the web proxy I see about 200Mbps as measured with 'nload' -- a request for the file uploads to the proxy at this rate until either the file ends or the downstream client terminates the connection.

From my linode server in Dallas I see a fairly consistent 80Mbs download rate on the above URL.

But from my home in Portland, on Comcast I sometimes see a happy 60-80 Mbps or so but sometimes see slowdowns to 3-5 Mbps, or even as low as 1-2 Mbps.

I can fetch data from upload.wikimedia.org (routed via the SFO proxies) or my Linode in Dallas at ~80 Mbps at the same time, so it seems to be something in the route between the proxy in IAD and back...?

Traceroute from my end:

Orac:~ brion$ traceroute media-streaming.wmflabs.org
traceroute to media-streaming.wmflabs.org (208.80.155.156), 64 hops max, 52 byte packets
 1  10.0.0.1 (10.0.0.1)  4.029 ms  5.200 ms  4.922 ms
 2  96.120.60.13 (96.120.60.13)  19.435 ms  14.751 ms  16.750 ms
 3  xe-11-0-0-rur02.beaverton.or.bverton.comcast.net (68.85.148.169)  13.995 ms  14.120 ms  10.768 ms
 4  ae-51-ar01.troutdale.or.bverton.comcast.net (68.87.216.105)  17.927 ms  19.227 ms  24.053 ms
 5  be-33490-cr01.seattle.wa.ibone.comcast.net (68.86.92.217)  35.884 ms  29.056 ms  34.949 ms
 6  hu-0-14-0-1-pe04.seattle.wa.ibone.comcast.net (68.86.84.46)  24.395 ms  25.249 ms  19.462 ms
 7  as6461.seattle.wa.ibone.comcast.net (173.167.56.202)  15.920 ms  17.969 ms  20.465 ms
 8  ae27.cs1.sea1.us.eth.zayo.com (64.125.29.0)  89.418 ms  94.299 ms  94.675 ms
 9  ae2.cs1.ord2.us.eth.zayo.com (64.125.29.27)  95.061 ms  93.572 ms  93.620 ms
10  ae3.cs1.lga5.us.eth.zayo.com (64.125.29.208)  101.511 ms  132.905 ms  93.427 ms
11  ae4.cs1.dca2.us.eth.zayo.com (64.125.29.203)  87.997 ms  93.277 ms  104.585 ms
12  ae27.cr1.dca2.us.zip.zayo.com (64.125.30.247)  92.623 ms  96.570 ms  97.165 ms
13  ae6.er1.iad10.us.zip.zayo.com (64.125.20.118)  92.175 ms  91.634 ms  92.641 ms
14  64.125.192.142.ipyx-125449-001-zyo.zip.zayo.com (64.125.192.142)  110.199 ms  115.739 ms  126.855 ms
15  208.80.155.156 (208.80.155.156)  95.887 ms  93.667 ms  89.889 ms
16  * * *
17  * *

I'm currently seeing about 12-14 Mbps from media-streaming.wmflabs.org and also about same from download.wikimedia.org which is also coming in via eqiad. (as of 9:46pm pacific time on may 31 2016)

Event Timeline

Restricted Application added subscribers: Zppix, Aklapper. · View Herald Transcript

Since you get bad download speeds, the opposite traceroute (from eqiad to you) is the more interesting one. I didn't have your IP, so I used hop 2's above, 96.120.60.13, which is close enough.

This was the traceroute from eqiad:

 1. ae2-1002.cr2-eqiad.wikimedia.org                   0.0%   208    0.2   0.9   0.2  54.0   4.9
 2. ae0.cr1-eqiad.wikimedia.org                        0.0%   208   25.0   8.5   0.2  72.9  14.9
 3. xe-0-6-0-20.r06.asbnva02.us.bb.gin.ntt.net         0.0%   208    0.8   0.9   0.6   3.7   0.3
 4. ae-0.comcast.asbnva02.us.bb.gin.ntt.net            0.0%   208    1.4   1.2   0.7   4.7   0.5
 5. hu-1-3-0-3-cr02.ashburn.va.ibone.comcast.net       0.0%   208    2.3   2.1   0.9   4.0   0.3
 6. be-10114-cr02.56marietta.ga.ibone.comcast.net      0.0%   208   14.1  15.1  13.7  22.8   0.6
 7. be-11424-cr02.dallas.tx.ibone.comcast.net          0.0%   208   34.3  33.6  32.1  36.1   0.5
 8. be-11524-cr02.losangeles.ca.ibone.comcast.net      0.0%   208   61.5  62.2  60.9  68.5   0.7
 9. be-10915-cr01.sunnyvale.ca.ibone.comcast.net       0.0%   208   69.4  71.5  69.3 162.9   9.2
10. ae-72-ar01.beaverton.or.bverton.comcast.net        0.0%   207   76.2  76.5  76.2  89.1   1.1
11. ae-1-rur01.beaverton.or.bverton.comcast.net        0.0%   207   76.4  76.9  76.2  97.0   2.3
12. 96.120.60.13                                       0.0%   207   76.7  76.8  76.6  80.8   0.5

This goes via NTT. Since the hand-off to Comcast happens in Ashburn (which is normal), at least those two alternatives are potential causes here:

  • The NTT Ashburn -> Comcast Ashburn link is congested
  • Comcast's backbone (Ashburn -> GA/TX/CA/OR -> Beaverton) is congested somewhere.

I've configured our routing to avoid that path and now we're going via Zayo (aka AboveNet) on the way back too:

 1. ae2-1002.cr2-eqiad.wikimedia.org                   0.0%   287    0.2   1.8   0.2  71.4   8.1
 2. xe-1-1-0.cr1-eqord.wikimedia.org                   0.0%   287   28.4  29.6  28.3  75.5   5.7
 3. 208.185.240.45.available.above.net                 0.0%   287   25.9  26.4  25.8  63.0   3.8
 4. 64.125.31.86                                       0.0%   287   30.2  30.4  30.2  45.3   1.4
 5. 64.125.31.85                                       0.0%   287   27.6  26.0  25.7  36.2   1.1
 6. ae11.er2.ord7.us.zip.zayo.com                      0.0%   286   30.1  30.2  30.0  40.5   1.1
 7. be-204-pe01.350ecermak.il.ibone.comcast.net        0.0%   286   30.4  30.4  30.2  31.1   0.0
 8. hu-2-4-0-0-cr02.350ecermak.il.ibone.comcast.net    0.0%   286   32.2  31.7  30.7  33.0   0.2
 9. be-10517-cr02.denver.co.ibone.comcast.net          0.0%   286   49.8  48.9  47.7  50.1   0.3
10. be-10817-cr01.seattle.wa.ibone.comcast.net         0.0%   286   70.4  70.8  69.6  72.3   0.4
11. ae-72-ar01.troutdale.or.bverton.comcast.net        0.0%   286   72.9  73.8  72.8 106.4   3.4
12. ae-1-rur02.beaverton.or.bverton.comcast.net        0.0%   286   74.5  74.9  74.4  95.2   2.2
13. 96.120.60.13                                       0.0%   286   74.5  74.5  74.4  74.6   0.0

The hand-off from Wikimedia/Zayo to Comcast happens now in Chicago, and then traverses Comcast's network via a northern route (IL/CO/WA) so a completely different path. This, unfortunately, wouldn't give us much insight on which of the two cases above are the root cause — but it's a useful test nevertheless, especially if you are still experiencing problems :)

Could you please try again and see if it makes a difference? It would make sense to try at approximately the same time of the day as your previous attempt, as backbone congestion might only occur during peak hours (whichever those may be for all of those states).

Thanks, I'll keep an eye out tonight and see if it gets congested again (currently seeing a cool 80 Mbits download rate at 7:04am pacific time)

As of 11:44 am pacific time I'm seeing 24Mbps on the new route through Chicago, down from 80Mbps earlier this morning.

Currently seeing my baseline 80 Mbps; floating IP 208.80.155.243 has been assigned for now to test without the proxy, just to double-confirm it's the route and not related to the proxy for now (it also makes it easier for me to measure bandwidth on the server end). Will test again later this evening in case congestion repeats.

Thanks. Testing against production (e.g. upload.wikimedia.org, but any host including bast1001 would do) would also be a useful data point, I think.

How sure are you that this congestion isn't local to Beaverton or Comcast? Have you tried downloading from anywhere else during peak hours?

faidon triaged this task as Medium priority.Jun 2 2016, 12:05 PM
faidon moved this task from Backlog to In Progress on the netops board.

Seems ok lately, haven't noticed any problems last week.

Either. You can have the IP back, I guess, doesn't seem to make any difference.

faidon claimed this task.

I'm resolving this, as this was primarily a task for the intermittent bandwidth issue. @yuvipanda, feel free to remove the IP (or not), or open a new task about it :) Thanks!

I'm encountering this problem again; the routes seem to have changed but symptoms are similar -- I see about 1-2 megabit/s downloads from eqiad (either media-streaming.wmflabs.org or dumps.wikimedia.org) from my Comcast IP in Portland.

Sample URL since the original one is no longer there: https://media-streaming.wmflabs.org/clean/transcoded/6/60/Knowledge_for_Everyone_%28short_cut%29.webm/Knowledge_for_Everyone_%28short_cut%29.webm.1080p.vp9.webm

Traceroute back from my eqiad VM to my IP:

traceroute to c-73-37-60-183.hsd1.or.comcast.net (73.37.60.183), 30 hops max, 60 byte packets
 1  10.68.16.1 (10.68.16.1)  0.422 ms  0.409 ms  0.387 ms
 2  ae2-1118.cr2-eqiad.wikimedia.org (10.64.20.3)  0.780 ms  0.506 ms  0.745 ms
 3  ash-b1-link.telia.net (80.239.132.225)  1.056 ms  1.032 ms  1.124 ms
 4  comcast-ic-318834-ash-b1.c.telia.net (62.115.149.65)  1.159 ms  1.143 ms  1.277 ms
 5  be-10142-cr02.ashburn.va.ibone.comcast.net (68.86.86.33)  2.017 ms  1.738 ms  2.987 ms
 6  be-10114-cr02.56marietta.ga.ibone.comcast.net (68.86.85.10)  17.724 ms  17.676 ms  17.994 ms
 7  be-11424-cr02.dallas.tx.ibone.comcast.net (68.86.85.22)  32.547 ms  33.615 ms  33.561 ms
 8  be-11524-cr02.losangeles.ca.ibone.comcast.net (68.86.87.173)  62.464 ms  62.429 ms  62.355 ms
 9  be-11015-cr02.sunnyvale.ca.ibone.comcast.net (68.86.86.97)  74.429 ms  74.395 ms  74.394 ms
10  ae-72-ar01.beaverton.or.bverton.comcast.net (68.86.92.206)  85.518 ms  86.387 ms  86.365 ms
11  ae-1-rur01.beaverton.or.bverton.comcast.net (68.85.146.198)  90.945 ms  89.967 ms  90.673 ms
12  po-1-1-cbr25.beaverton.or.bverton.comcast.net (68.85.148.162)  89.904 ms  89.789 ms  89.870 ms

Could be the Comcast trip around the nation, or it could be the WMF->Telia or Telia->Comcast ends of things.

For my Linode server in Fremont, CA I get a trace through HE and downloads are super fast:

traceroute to brionv.com (45.79.65.211), 30 hops max, 60 byte packets
 1  10.68.16.1 (10.68.16.1)  0.227 ms  0.199 ms  0.147 ms
 2  ae2-1118.cr2-eqiad.wikimedia.org (10.64.20.3)  0.521 ms  0.509 ms  0.489 ms
 3  ae0.cr1-eqiad.wikimedia.org (208.80.154.193)  0.457 ms  0.438 ms  0.420 ms
 4  v405.core1.ash1.he.net (216.66.30.89)  11.181 ms  0.534 ms  0.598 ms
 5  100ge7-2.core1.pao1.he.net (184.105.222.41)  59.762 ms  59.776 ms  59.758 ms
 6  100ge9-2.core1.sjc2.he.net (72.52.92.157)  60.707 ms 10ge7-5.core1.sjc2.he.net (72.52.92.70)  61.689 ms 100ge9-2.core1.sjc2.he.net (72.52.92.157)  60.149 ms
 7  10ge8-2.core3.fmt2.he.net (184.105.222.13)  60.301 ms  132.462 ms  73.629 ms
 8  * * *
 9  173.230.159.1 (173.230.159.1)  65.392 ms 173.230.159.3 (173.230.159.3)  61.790 ms 173.230.159.1 (173.230.159.1)  65.812 ms
10  li1164-211.members.linode.com (45.79.65.211)  65.140 ms  65.120 ms  65.036 ms

In the middle of the week it seems less congested than on the weekend, still on the same route. Seeing up to 32 megabits download, which is more reasonable but still less than I should be able to get (150 is my theoretical local download cap, and I can reach or surpass it from my linode server).

What speed/path do you get for example between your home and your linode server?

Can you also try for example: https://upload.wikimedia.org/wikipedia/commons/7/72/Brzansko-Moraviste-Pejzazi_20170121_2218.webm
Which goes from ulsfo through NTT then Comcast on the way back.

I get a full 150 megabits download (my bandwidth cap) on that file from ulsfo, and about 100 megabits from my Linode server (tested with scp instead of wget, so might vary).

Route from linode to me:

traceroute to c-73-37-60-183.hsd1.or.comcast.net (73.37.60.183), 30 hops max, 60 byte packets
 1  23.92.24.3 (23.92.24.3)  0.948 ms  0.936 ms  0.989 ms
 2  173.230.159.2 (173.230.159.2)  0.604 ms 173.230.159.4 (173.230.159.4)  0.587 ms 173.230.159.2 (173.230.159.2)  0.636 ms
 3  173.230.159.9 (173.230.159.9)  0.575 ms 172.18.0.37 (172.18.0.37)  3.548 ms 173.230.159.9 (173.230.159.9)  0.609 ms
 4  100ge14-2.core1.sjc2.he.net (72.52.92.246)  0.849 ms 10ge7-9.core1.sjc2.he.net (184.105.222.14)  0.841 ms 100ge14-2.core1.sjc2.he.net (72.52.92.246)  0.834 ms
 5  comcast-7922-as7922.10gigabitethernet1-1-3.switch1.sjc2.he.net (216.218.213.102)  1.586 ms 100ge14-2.core1.sjc2.he.net (72.52.92.246)  0.881 ms  0.889 ms
 6  64.62.153.170 (64.62.153.170)  1.606 ms hu-0-3-0-7-cr01.9greatoaks.ca.ibone.comcast.net (68.86.83.137)  1.662 ms hu-0-4-0-4-cr01.9greatoaks.ca.ibone.comcast.net (68.86.88.189)  2.221 ms
 7  hu-0-4-0-5-cr01.9greatoaks.ca.ibone.comcast.net (68.86.88.205)  2.215 ms be-11025-cr02.sunnyvale.ca.ibone.comcast.net (68.86.87.157)  2.267 ms  4.313 ms
 8  ae-72-ar01.beaverton.or.bverton.comcast.net (68.86.92.206)  15.693 ms  15.687 ms  15.673 ms
 9  ae-1-rur01.beaverton.or.bverton.comcast.net (68.85.146.198)  20.270 ms ae-72-ar01.beaverton.or.bverton.comcast.net (68.86.92.206)  15.617 ms ae-1-rur01.beaverton.or.bverton.comcast.net (68.85.146.198)  20.370 ms
10  ae-1-rur01.beaverton.or.bverton.comcast.net (68.85.146.198)  20.348 ms  20.338 ms po-1-1-cbr25.beaverton.or.bverton.comcast.net (68.85.148.162)  20.372 ms
11  po-1-1-cbr25.beaverton.or.bverton.comcast.net (68.85.148.162)  20.434 ms  20.418 ms  20.400 ms

Haven't encountered this in a while; Comcast etc may have improved the intermediate routes. Closing out as resolved.