Page MenuHomePhabricator

Network congestion between DTAG & eqiad
Closed, ResolvedPublic

Description

Since a long time i have the problem, that nearly all connections to any engineering project (gerrit, wmflabs, phabricator,...) is extremly slow. Actually i'm developing from a remote server, where anything works fine, but i'm unable to clone projects to my local computer (except from github, e.g., but i can't commit changes to gerrit then :(). Anything i want to do takes a very long time (e.g. load an image in phabricator or clone a project from gerrit (which is, in fact, impossible)).

I tried to track down the problem and found out, that nearly any connection to wmf engeenering infrastructure has transfer rates from 8-25 kB/s, which isn't normal for my connection (DSL 16000 which 1,6MBit/s normally, e.g. from github or debian image hosts). So i made a tracert to gerrit (as an example), see:

C:\Users\Florian>tracert gerrit.wikimedia.org

Routenverfolgung zu gerrit.wikimedia.org [208.80.154.81]
über maximal 30 Hops:

  1     2 ms    <1 ms     1 ms  speedport.ip [192.168.2.1]
  2    26 ms    20 ms    19 ms  217.0.117.157
  3    20 ms    21 ms    23 ms  87.186.195.242
  4    23 ms    23 ms    23 ms  hh-ea8-i.HH.DE.NET.DTAG.DE [62.154.32.129]
  5    26 ms    30 ms    27 ms  80.150.168.162
  6    30 ms    22 ms    22 ms  hbg-bb4-link.telia.net [213.155.135.84]
  7   142 ms   153 ms   145 ms  ash-bb4-link.telia.net [62.115.141.112]
  8   262 ms   236 ms   275 ms  ash-b2-link.telia.net [62.115.134.58]
  9   108 ms   111 ms   118 ms  wikimedia-ic-308845-ash-b2.c.telia.net [80.239.
32.226]
 10   163 ms   121 ms   113 ms  gerrit.wikimedia.org [208.80.154.81]

Ablaufverfolgung beendet.

C:\Users\Florian>

Just as an example, i tried downloading MediaWiki's latest wmf snapshoot (wmf21) with the following result (i stopped after some minutes ;)):

florian@florian-VirtualBox:/var/www/html/w/extensions/MobileFrontend$ wget https://tools.wmflabs.org/snapshots/builds/mediawiki-core/mediawiki-snapshot-wmf_1_25wmf21-bc81f1e.tar.gz
--2015-03-12 20:06:45--  https://tools.wmflabs.org/snapshots/builds/mediawiki-core/mediawiki-snapshot-wmf_1_25wmf21-bc81f1e.tar.gz
Auflösen des Hostnamen »tools.wmflabs.org (tools.wmflabs.org)«... 208.80.155.131
Verbindungsaufbau zu tools.wmflabs.org (tools.wmflabs.org)|208.80.155.131|:443... verbunden.
HTTP-Anforderung gesendet, warte auf Antwort... 200 OK
Länge: 20337555 (19M) [application/octet-stream]
In »»mediawiki-snapshot-wmf_1_25wmf21-bc81f1e.tar.gz«« speichern.

 6% [=========>                                                                                                                                                          ] 1.285.946   11,4KB/s  ETA 28m 45s

That's really really bad, can we do anything to change the current situation (maybe to a better one)? :( If any information is needed, i'm willing to give it :)

Event Timeline

Florian raised the priority of this task from to Needs Triage.
Florian updated the task description. (Show Details)
Florian added projects: acl*sre-team, netops.
Florian added subscribers: Florian, Reedy.

Wondering if T92513: Gerrit (ssh) is unusably slow at certain times of day from europe is related / one manifestation.

I am afraid I cannot reproduce (being in Europe, it's 10:42UTC):

$:andre\> wget -v http://tools.wmflabs.org/snapshots/builds/mediawiki-core/mediawiki-snapshot-master-d4dae8e.tar.gz
--2015-03-17 11:42:40--  http://tools.wmflabs.org/snapshots/builds/mediawiki-core/mediawiki-snapshot-master-d4dae8e.tar.gz
Resolving tools.wmflabs.org (tools.wmflabs.org)... 208.80.155.131
Connecting to tools.wmflabs.org (tools.wmflabs.org)|208.80.155.131|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 20421183 (19M) [application/octet-stream]
Saving to: ‘mediawiki-snapshot-master-d4dae8e.tar.gz’

mediawiki-snapshot-master-d4dae8e.tar.g  31%[========================>                                                       ]   6.10M  1.19MB/s   eta 14s   ^C
$:andre\>
faidon renamed this task from Very slow connection to wmf engineering infrastructure to Network congestion between DTAG & eqiad.Mar 17 2015, 12:39 PM
faidon set Security to None.
faidon added subscribers: matmarex, JanZerebecki, faidon and 4 others.

@Aklapper Which provider do you use? Can you make a tracert (just to see, which hops and networks your provider uses)?

There are already some well-known problems with the connections from DTAG network to other hosts using the network of telia.net, see e.g. (official german forum of DTAG Germany):
https://telekomhilft.telekom.de/t5/V-DSL-Glasfaser/Routingsprobleme-unter-anderem-Telia-net/m-p/1296499
https://telekomhilft.telekom.de/t5/Frage-stellen/Telia-Akamai-%C3%9Cbergang/qaq-p/1302707/search-sort-type-order/date
https://telekomhilft.telekom.de/t5/V-DSL-Glasfaser/Hohe-Pings-bei-der-Connection-zu-Telia-Riot-League-of-Legends/td-p/1216870

Maybe related, maybe not :( It would be interesting, which providers @aude, @Umherirrender and others with this problem from WMDE have.

@JanZerebecki gave us traceroutes from WMDE on T92513. They seem to be using DTAG (Deutsche Telekom) as well, which is why I merged the two tasks.

The DTAG->Wikimedia path goes via Telia but the reverse path goes via TiNet/GTT. We'll have to experiment a bit more before we track this down but it'd be better if I can get quick feedback on changes from our side. Last I heard, the congestion happens at approx. 5pm Germany time, so I'll be around at that time to get some feedback from anyone who happens to be on IRC at the time :)

@faidon: Argh, right, haven't seen :) I'd be happy to assist you whenever i have time, just ping me on IRC (FlorianSW on #wikimedia-dev, #wikimedia-operations, #wikimedia-mobile...) :)

faidon claimed this task.

OK, I downprefed the reverse path via GTT (now it happens to go via Telia) and got positive confirmation on IRC from both @JanZerebecki & @Florian. I'll raise it with GTT but I don't expect much — it probably is more political (peering relationships) than technical. Thanks for the feedback, folks :)

Thank you Faidon for forwarding this (and Florian and aude bringing this up)!

Just for completeness, to answer @Florian's question (and trying around 18:00UTC):

@Aklapper Which provider do you use?

Directly (GTT):

$:andre\> traceroute gerrit.wikimedia.org
traceroute to gerrit.wikimedia.org (208.80.154.81), 30 hops max, 60 byte packets
 1  192.168.1.1 (192.168.1.1)  3.120 ms  7.523 ms  7.487 ms
 2  * * *
 3  ip-86-xx-yy-zzz.net.upcbroadband.cz (86.xx.yy.zzz)  10.772 ms  11.076 ms  13.225 ms
 4  cz-prg02a-ra2-vla2006.net.upc.cz (84.116.222.185)  13.730 ms  18.342 ms  18.310 ms
 5  ae1.prg11.ip4.gtt.net (77.67.90.25)  18.280 ms  18.232 ms  18.186 ms
 6  xe-8-0-3.was10.ip4.gtt.net (141.136.108.177)  120.843 ms  116.361 ms  116.877 ms
 7  xe-5-3-1.cr2-eqiad.wikimedia.org (173.241.131.218)  116.803 ms  116.796 ms  120.265 ms

Using VPN via a university account (Telia):

$:andre\> traceroute gerrit.wikimedia.org
traceroute to gerrit.wikimedia.org (208.80.154.81), 30 hops max, 60 byte packets
 1  xwin-asa.rz.tu-bs.de (134.169.3.105)  30.051 ms  30.527 ms  30.492 ms
 2  xr-bra1-pc2.x-win.dfn.de (188.1.235.101)  45.691 ms  46.146 ms  46.112 ms
 3  xr-han1-te2-1.x-win.dfn.de (188.1.145.93)  32.843 ms  33.309 ms  37.308 ms
 4  cr-han1-te0-7-0-0.x-win.dfn.de (188.1.145.250)  38.387 ms  38.788 ms  38.753 ms
 5  cr-tub1-hundredgige0-6-0-0-7.x-win.dfn.de (188.1.144.190)  41.949 ms  42.298 ms  42.689 ms
 6  be4193.rcr11.b015814-1.ham01.atlas.cogentco.com (149.6.142.101)  51.869 ms  44.904 ms  45.301 ms
 7  be2460.ccr42.ham01.atlas.cogentco.com (154.54.38.241)  45.167 ms be2198.ccr41.ham01.atlas.cogentco.com (154.54.39.5)  44.564 ms be2460.ccr42.ham01.atlas.cogentco.com (154.54.38.241)  45.044 ms
 8  be2306.rcr21.cph01.atlas.cogentco.com (130.117.3.238)  53.274 ms be2303.rcr21.cph01.atlas.cogentco.com (130.117.3.162)  53.764 ms be2306.rcr21.cph01.atlas.cogentco.com (130.117.3.238)  53.675 ms
 9  telia.cph01.atlas.cogentco.com (130.117.14.34)  53.633 ms  56.917 ms  56.101 ms
10  kbn-bb4-link.telia.net (62.115.142.208)  56.765 ms kbn-bb4-link.telia.net (62.115.142.210)  56.684 ms  56.660 ms
11  nyk-bb1-link.telia.net (62.115.141.99)  133.153 ms nyk-bb1-link.telia.net (80.91.249.21)  127.366 ms nyk-bb2-link.telia.net (62.115.141.105)  131.472 ms
12  ash-bb3-link.telia.net (213.155.134.127)  134.924 ms ash-bb3-link.telia.net (62.115.134.111)  131.010 ms ash-bb3-link.telia.net (213.155.130.79)  129.855 ms
13  ash-b2-link.telia.net (62.115.134.54)  134.026 ms  144.925 ms ash-b2-link.telia.net (80.91.252.93)  130.768 ms
14  wikimedia-ic-308845-ash-b2.c.telia.net (80.239.132.226)  131.340 ms  132.044 ms  136.682 ms

Has it been resolved really?

I currently see slowness at this time (from Spain):

$:/tmp> wget http://tools.wmflabs.org/snapshots/builds/mediawiki-core/mediawiki-snapshot-master-ef4f1e4.tar.gz
--2015-03-27 20:23:38--  http://tools.wmflabs.org/snapshots/builds/mediawiki-core/mediawiki-snapshot-master-ef4f1e4.tar.gz
Resolviendo tools.wmflabs.org (tools.wmflabs.org)... 208.80.155.131
Conectando con tools.wmflabs.org (tools.wmflabs.org)[208.80.155.131]:80... conectado.
Petición HTTP enviada, esperando respuesta... 200 OK
Longitud: 20474838 (20M) [application/octet-stream]
Grabando a: “mediawiki-snapshot-master-ef4f1e4.tar.gz”

mediawiki-snapshot-master-ef4f1e4.tar.gz                         19%[============================>                                                                                                                           ]   3,74M  30,9KB/s   eta 6m 31s^C
^C

$:~> sudo mtr  -c 100 --report-wide --show-ips gerrit.wikimedia.org
Start: Fri Mar 27 20:18:51 2015
HOST:                                                                     Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- 192.168.1.1                                                          0.0%   100    0.8   0.8   0.7   3.7   0.3
  2.|-- 192.168.144.1                                                        0.0%   100   39.2  47.0  36.0  84.9  10.3
  3.|-- 181.Red-81-46-130.staticIP.rima-tde.net (81.46.130.181)              0.0%   100   42.7  45.6  38.5  65.7   6.6
  4.|-- 241.Red-80-58-75.staticIP.rima-tde.net (80.58.75.241)                0.0%   100   42.4  47.3  39.3  66.9   6.7
  5.|-- et3-0-0-400-GRTBCNTB1.red.telefonica-wholesale.net (94.142.103.177) 45.0%   100   39.2  46.1  38.0  65.3   8.3
  6.|-- Xe1-0-5-0-grtpareq1.red.telefonica-wholesale.net (94.142.119.177)    0.0%   100   72.3  71.0  58.3 125.6  13.6
  7.|-- prs-b8-link.telia.net (80.239.192.73)                                0.0%   100   65.1  70.4  63.2  89.0   6.5
  8.|-- prs-bb3-link.telia.net (213.155.132.224)                             0.0%   100   66.3  72.2  64.4  88.5   8.0
  9.|-- ash-bb3-link.telia.net (80.91.251.98)                                0.0%   100  152.1 161.0 151.5 213.6  11.4                                                                                                                                        
 10.|-- ash-b2-link.telia.net (80.91.252.93)                                 0.0%   100  145.5 151.8 144.0 176.9   7.0                                                                                                                                        
 11.|-- wikimedia-ic-308845-ash-b2.c.telia.net (80.239.132.226)              0.0%   100  150.9 155.5 148.6 176.7   7.4                                                                                                                                        
 12.|-- gerrit.wikimedia.org (208.80.154.81)                                 0.0%   100  154.7 157.7 149.9 176.4   6.3                                                                                                                                        


$:~> sudo mtr  -c 100 --report-wide --show-ips -T -P 29418 gerrit.wikimedia.org
Start: Fri Mar 27 20:20:25 2015
HOST:                                                                     Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- 192.168.1.1                                                          0.0%   100    3.4   2.6   0.8   3.4   0.1
  2.|-- 192.168.144.1                                                        0.0%   100   44.8  44.6  40.2  93.6   7.4
  3.|-- ???                                                                 100.0   100    0.0   0.0   0.0   0.0   0.0
  4.|-- 241.Red-80-58-75.staticIP.rima-tde.net (80.58.75.241)                0.0%   100   43.5  44.3  41.7  49.4   1.3
  5.|-- et3-0-0-400-GRTBCNTB1.red.telefonica-wholesale.net (94.142.103.177) 28.0%   100   42.5 808.7  40.8 7054. 1998.3
  6.|-- Xe5-0-4-0-grtpareq1.red.telefonica-wholesale.net (84.16.13.30)       0.0%   100  128.1  73.5  57.8 162.4  23.2
  7.|-- prs-b8-link.telia.net (80.239.192.73)                                0.0%   100   64.8  63.7  57.1  87.4   3.8
  8.|-- prs-bb2-link.telia.net (213.155.131.10)                             49.0%   100   65.6  70.1  57.9 166.6  20.2
  9.|-- ash-bb3-link.telia.net (80.91.251.243)                               0.0%   100  146.5 150.6 135.4 211.4  12.6
 10.|-- ash-b2-link.telia.net (80.91.252.93)                                 0.0%   100  155.5 146.0 135.7 156.4   5.4
 11.|-- wikimedia-ic-308845-ash-b2.c.telia.net (80.239.132.226)              0.0%   100  142.3 147.0 140.4 163.8   4.9
 12.|-- gerrit.wikimedia.org (208.80.154.81)                                 0.0%   100  167.0 187.8 166.9 1170. 141.3

@Ciencia_Al_Poder: I have no problems from Germany, but you're using another ISP than me.

I think you issue is not related to this one. Perhaps contact your ISP telefonica their support?
Your reverse:

$ mtr -c 1 --report-wide --report 80.58.75.241
HOST: bastion1                                         Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- eth4-1102.labnet1001.eqiad.wmnet                  0.0%     1    0.4   0.4   0.4   0.4   0.0
  2.|-- ae2-1118.cr2-eqiad.wikimedia.org                  0.0%     1    0.7   0.7   0.7   0.7   0.0
  3.|-- ash-b2-link.telia.net                             0.0%     1    0.6   0.6   0.6   0.6   0.0
  4.|-- ash-b1-link.se.telia.net                          0.0%     1    0.5   0.5   0.5   0.5   0.0
  5.|-- ash-b3-link.telia.net                             0.0%     1    1.4   1.4   1.4   1.4   0.0
  6.|-- telefonica-ic-126960-ash-bb1.c.telia.net          0.0%     1    2.0   2.0   2.0   2.0   0.0
  7.|-- Xe5-0-0-0-grtpartv1.red.telefonica-wholesale.net  0.0%     1   81.0  81.0  81.0  81.0   0.0
  8.|-- Xe6-0-1-0-grtmadad1.red.telefonica-wholesale.net  0.0%     1   99.1  99.1  99.1  99.1   0.0
  9.|-- ???                                              100.0     1    0.0   0.0   0.0   0.0   0.0