Page MenuHomePhabricator

Network issues for users in the UK and Ireland
Closed, ResolvedPublicBUG REPORT

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

I myself can't help you troubleshoot because I am not in these regions and do not have any connectivity issues. I'm just forwarding all these reports since it's clearly a problem for many.

Ok for me from one UK ISP and two in IE (also confirmed by browsing):

root@uk:~# mtr -z -b -w -c 10 text-lb.wikimedia.org
Start: 2023-12-28T00:33:56+0100
HOST: uk.rankinrez.net                                                         Loss%   Snt   Last   Avg  Best  Wrst StDev
  1. AS46261  2a07:4580:b0d::1                                                  0.0%    10    1.2   1.4   1.1   3.0   0.6
  2. AS???    ???                                                              100.0    10    0.0   0.0   0.0   0.0   0.0
  3. AS20860  be6.3222.asr01.dc13.as20860.net (2001:1b40:f000:3222::2)          0.0%    10    1.1   1.1   0.9   1.3   0.1
  4. AS20860  be16.asr01.ld5.as20860.net (2001:1b40:f000:10a:202::1)           10.0%    10    6.7   6.8   6.7   7.0   0.1
  5. AS???    ???                                                              100.0    10    0.0   0.0   0.0   0.0   0.0
  6. AS???    ???                                                              100.0    10    0.0   0.0   0.0   0.0   0.0
  7. AS???    ???                                                              100.0    10    0.0   0.0   0.0   0.0   0.0
  8. AS???    ???                                                              100.0    10    0.0   0.0   0.0   0.0   0.0
  9. AS???    ae1-380.cr1-esams.wikimedia.org (2001:7f8:1::a501:4907:1)         0.0%    10   13.2  13.1  12.9  14.1   0.4
 10. AS14907  et-0-0-48.asw1-bw27-esams.wikimedia.org (2a02:ec80:300:fe04::2)   0.0%    10   13.3  13.6  13.1  16.2   0.9
 11. AS14907  text-lb.esams.wikimedia.org (2a02:ec80:300:ed1a::1)               0.0%    10   12.7  12.9  12.7  13.6   0.3
root@uk:~# mtr -4 -b -w -z -c 10 text-lb.wikimedia.org
Start: 2023-12-28T01:09:22+0100
HOST: uk.rankinrez.net                                                 Loss%   Snt   Last   Avg  Best  Wrst StDev
  1. AS46261  91.132.85.1                                               0.0%    10    0.9   1.5   0.9   4.9   1.2
  2. AS20860  185.91.76.101                                            90.0%    10    0.6   0.6   0.6   0.6   0.0
  3. AS20860  be6.3222.asr01.dc13.as20860.net (130.180.203.223)         0.0%    10    7.2   7.1   7.1   7.2   0.1
  4. AS20860  be16.asr01.ld5.as20860.net (130.180.202.0)                0.0%    10    7.7   7.6   7.3   8.2   0.3
  5. AS20860  be98.asr01.thn.as20860.net (62.128.211.137)               0.0%    10    7.7   7.4   7.3   7.7   0.1
  6. AS???    linx-1.init7.net (195.66.224.175)                         0.0%    10    7.7   8.0   7.6   9.7   0.6
  7. AS???    r1ams1.core.init7.net (80.249.208.210)                    0.0%    10   12.6  12.8  12.6  13.0   0.1
  8. AS13030  r1ams2.core.init7.net (5.180.135.234)                     0.0%    10   12.4  12.5  12.3  13.0   0.2
  9. AS???    ???                                                      100.0    10    0.0   0.0   0.0   0.0   0.0
 10. AS14907  et-0-0-50.asw1-bw27-esams.wikimedia.org (185.15.59.159)   0.0%    10   14.4  16.6  12.8  25.0   3.5
 11. AS14907  text-lb.esams.wikimedia.org (185.15.59.224)               0.0%    10   13.2  13.5  13.2  15.8   0.8
cathal@officepc:~$ mtr -4 -z -b -w -c 10 text-lb.wikimedia.org
Start: 2023-12-27T23:35:20+0000
HOST: officepc                                                         Loss%   Snt   Last   Avg  Best  Wrst StDev
  1. AS???    nbgw (192.168.240.1)                                      0.0%    10    0.4   0.4   0.3   0.5   0.1
  2. AS6830   176.61.34.1                                               0.0%    10   11.6  14.5  11.6  30.1   5.6
  3. AS6830   109.255.253.254                                           0.0%    10   12.9  13.9  10.7  19.1   2.6
  4. AS6830   ie-dub02a-rc1-ae-31-0.aorta.net (84.116.238.38)           0.0%    10   29.2  27.0  24.0  32.1   2.9
  5. AS6830   nl-ams02a-rc2-lag-1-0.aorta.net (84.116.130.33)          60.0%    10   32.7  30.8  29.4  32.7   1.6
  6. AS6830   cz-prg01a-ra4-ae-99-40.aorta.net (84.116.136.145)        70.0%    10   28.1  26.8  23.4  28.9   3.0
  7. AS33915  nl-ams04a-rb2-lo0-100.aorta.net (213.46.186.10)           0.0%    10   29.4  25.2  23.3  29.4   1.8
  8. AS14907  et-0-0-50.asw1-bw27-esams.wikimedia.org (185.15.59.159)   0.0%    10   31.5  32.5  26.4  44.3   5.2
  9. AS14907  text-lb.esams.wikimedia.org (185.15.59.224)               0.0%    10   24.4  26.4  24.1  33.4   2.7
cathal@officepc:~$ mtr -z -b -w -c 10 text-lb.wikimedia.org
Start: 2023-12-27T23:36:20+0000
HOST: officepc                                                                 Loss%   Snt   Last   Avg  Best  Wrst StDev
  1. AS5466   2001:bb6:8b53:a800::1                                             0.0%    10    0.4   0.4   0.3   0.4   0.0
  2. AS5466   2001:bb0:6:a11d::1                                                0.0%    10    4.9   7.1   4.6  13.0   2.7
  3. AS5466   2001:bb0:6:a197::1                                                0.0%    10    4.8   4.8   4.4   5.2   0.2
  4. AS???    ???                                                              100.0    10    0.0   0.0   0.0   0.0   0.0
  5. AS6830   ie-dub02a-rc1-lo0-0.v6.aorta.net (2001:730:2c00::5474:80f7)       0.0%    10   20.6  20.6  20.1  21.1   0.4
  6. AS6830   2001:730:2200::5474:8082                                          0.0%    10   20.6  20.4  20.2  20.7   0.2
  7. AS6830   2001:730:2200::5474:803f                                          0.0%    10   20.5  20.8  20.5  21.5   0.3
  8. AS6830   2001:730:2209:1::d52e:ba0a                                        0.0%    10   20.0  20.1  19.8  20.6   0.3
  9. AS14907  et-0-0-50.asw1-bw27-esams.wikimedia.org (2a02:ec80:300:fe06::2)   0.0%    10   21.3  21.0  20.8  21.3   0.1
 10. AS14907  text-lb.esams.wikimedia.org (2a02:ec80:300:ed1a::1)               0.0%    10   20.0  20.1  19.9  20.3   0.1

I'll dig a little into the reports see if there is any pattern. But given it's not a problem in esams it's likely an issue with something (ISP most like) outside our control.

Judging from the comments it's BT UK affected. NELs show TCP timeouts mostly, but also TCP resets starting from about the 22nd.

Taking an example IP from NELs and tracing to it the trace gets goes to AS5400 (BT GLobal Services) from our direct peering at esams, but seems to die quickly once it hits AS2856 (BT UK). I don't think shutting our peering to AS5400 would do any good for that reason:

cmooney@lvs3008:~$ sudo traceroute -s 185.15.59.224 -I -w 1 <redacted> 
traceroute to 86.185.185.193 (86.185.185.193), 30 hops max, 60 byte packets
 1  * * *
 2  et-0-0-0.cr2-esams.wikimedia.org (185.15.59.158)  0.141 ms  0.144 ms  0.142 ms
 3  ae0.cr1-esams.wikimedia.org (185.15.59.152)  0.320 ms  0.331 ms  0.434 ms
 4  ams-ix.bt.com (80.249.208.108)  0.917 ms  0.928 ms  0.937 ms
 5  t2c4-et-5-0-2-0.nl-ams2.gia.bt.net (166.49.164.131)  0.952 ms  0.962 ms  0.993 ms
 6  t2c4-et-8-0-5.uk-lon1.gia.bt.net (166.49.195.122)  7.380 ms  7.256 ms  7.262 ms
 7  166-49-214-169.gia.bt.net (166.49.214.169)  6.114 ms  6.059 ms  6.086 ms
 8  core1-hu0-7-0-3.guildford.ukcore.bt.net (62.172.103.148)  7.581 ms  7.615 ms  7.605 ms
 9  * * *
10  * * *

Same goes for IPs which are registered to Plusnet, they are behind AS2856 and stop at a similar point:

cmooney@cr2-esams> show route protocol bgp table inet.0 <redacted> terse 

inet.0: 933107 destinations, 3264623 routes (932617 active, 4 holddown, 539 hidden)
Restart Complete
+ = Active Route, - = Last Active, * = Both

A V Destination        P Prf   Metric 1   Metric 2  Next hop        AS path
* N 87.114.0.0/16      B 170        250                             5400 2856 6871 ?
  unknown                                          >185.15.59.152
  N                    B 170        100          1                  13030 2856 6871 ?
  unknown                                          >77.109.134.113
  N                    B 170        100                             6830 5400 2856 6871 ?
  unknown                                          >213.46.186.9
  N                    B 170        100                             1257 5400 2856 6871 ?
  unknown                                          >130.244.6.249
cmooney@lvs3008:~$ sudo traceroute -s 185.15.59.224 -I -w 1 <redacted>
traceroute to 87.114.22.104 (87.114.22.104), 30 hops max, 60 byte packets
 1  irb-321.asw1-bw27-esams.esams.wmnet (10.80.0.1)  4.520 ms  4.519 ms  4.520 ms
 2  et-0-0-0.cr2-esams.wikimedia.org (185.15.59.158)  0.360 ms  0.361 ms  0.409 ms
 3  ae0.cr1-esams.wikimedia.org (185.15.59.152)  0.334 ms  0.345 ms  0.377 ms
 4  ams-ix.bt.com (80.249.208.108)  0.655 ms  0.665 ms  0.666 ms
 5  t2c4-et-5-0-2-0.nl-ams2.gia.bt.net (166.49.164.131)  0.824 ms  0.843 ms  1.026 ms
 6  t2c4-et-9-1-3.uk-lon1.gia.bt.net (166.49.208.37)  7.204 ms  7.082 ms  7.073 ms
 7  166-49-214-169.gia.bt.net (166.49.214.169)  6.277 ms  6.276 ms  6.249 ms
 8  core1-hu-0-12-0-7.colindale.ukcore.bt.net (62.172.103.171)  7.628 ms  7.895 ms  7.923 ms
 9  * * *
10  * * *
11  * * *
12  * * *
13  * * *

I'll prep a mail and send to their NOC, but I ultimately don't see there is much we can do here.

Ok mail sent to make them aware. Not really much else we can do here I think.

Some good diagnostics from a user on EE cellular connection in Northern Ireland from Dec 22nd below.

What is interesting is a traceroute doesn't fail, intial TCP connection gets a response, but then it doesn't complete. Also it only occurred when trying esams direct, not our other POPs:

I'm not sure when the issue started, but it has been ongoing since I arrived here on 22 December 2023. This issue is affecting any device (Windows 11 laptop, Android phone, Android tablet) that I try to use to access a Wikimedia website. I'm currently in County Down, in Northern Ireland, the ISP is EE Business, and it's connecting over a 4G connection. The output of test-ipv6.com is that this is an IPv4 only connection. The current IP address for the connection is <removed>.

The output of curl -m 60 -v https://en.wikipedia.org/wiki/Main_Page command is:
*   Trying 185.15.59.224:80...
* Connected to en.wikipedia.org (185.15.59.224) port 80
> GET /wiki/Main_Page HTTP/1.1
> Host: en.wikipedia.org
> User-Agent: curl/8.4.0
> Accept: */*
>
* Operation timed out after 60009 milliseconds with 0 bytes received
* Closing connection
curl: (28) Operation timed out after 60009 milliseconds with 0 bytes received

The output of tracert is:
Tracing route to dyna.wikimedia.org [185.15.59.224]
over a maximum of 30 hops:

  1     1 ms     1 ms     1 ms  192.168.1.1
  2    26 ms    28 ms    37 ms  11.1.6.254
  3     *        *        *     Request timed out.
  4    30 ms    29 ms    27 ms  109.249.132.29
  5     *        *        *     Request timed out.
  6    44 ms    32 ms    31 ms  109.249.132.42
  7    72 ms    51 ms    46 ms  peer2-et3-1-6.slough.ukcore.bt.net [62.6.201.23]
  8    44 ms    34 ms    35 ms  linx-1.init7.net [195.66.224.175]
  9    41 ms    36 ms    40 ms  r2ams2.core.init7.net [5.180.135.240]
 10    78 ms    37 ms    40 ms  r1ams2.core.init7.net [5.180.135.234]
 11    53 ms    37 ms    46 ms  gw-wikimedia.init7.net [77.109.134.114]
 12    40 ms    40 ms    45 ms  et-0-0-50.asw1-bw27-esams.wikimedia.org [185.15.59.159]
 13     *       47 ms    41 ms  text-lb.esams.wikimedia.org [185.15.59.224]

If I ping -t the IP for esams for 60 seconds, I get approximately 40% packet loss. 

The output of the curl command to esams, with a 60 second timeout is:
*   Trying 185.15.59.224:443...
* Connected to text-lb.esams.wikimedia.org (185.15.59.224) port 443
* schannel: disabled automatic use of client certificate
* ALPN: curl offers http/1.1
* Connection timed out after 60017 milliseconds
* Closing connection
* schannel: shutting down SSL/TLS connection with text-lb.esams.wikimedia.org port 443
curl: (28) Connection timed out after 60017 milliseconds
* URL rejected: Bad hostname
* Closing connection
curl: (3) URL rejected: Bad hostname

The curl command to ulsfo is:
*   Trying 198.35.26.96:443...
* Connected to text-lb.ulsfo.wikimedia.org (198.35.26.96) port 443
* schannel: disabled automatic use of client certificate
* ALPN: curl offers http/1.1
* ALPN: server accepted http/1.1
* using HTTP/1.1
> GET / HTTP/1.1
> Host: text-lb.ulsfo.wikimedia.org
> User-Agent: curl/8.4.0
> Accept: */*
>
* schannel: remote party requests renegotiation
* schannel: renegotiating SSL/TLS connection
* schannel: SSL/TLS connection renegotiated
* schannel: remote party requests renegotiation
* schannel: renegotiating SSL/TLS connection
* schannel: SSL/TLS connection renegotiated
< HTTP/1.1 400
< date: Sun, 24 Dec 2023 00:49:09 GMT
< server: Varnish
< x-cache: cp4039 int
< x-cache-status: int-front
< server-timing: cache;desc="int-front", host;desc="cp4039"
<--- remainder cut --->

The same user could access the site through another cellular connection "on the same tower" they said. So the difference there would be APN and user profile on the mobile side (putting it squarely back on the carrier side). But still an unusual situation for sure.

One thing I note also is the "working" connection was on IPv6 not v4, and routed from BT UK AS2856 through BT Global AS5400 and came into us over AMS-IX. Whereas the failed one came in over Init7 transit.

Arzhel responded and asked if it's still happening so let's see if they can confirm it's still ongoing.

Hey.

I'm the user from Northern Ireland who sent the email cmooney copied above. I'm currently struck down with some sort of cold or sinus infection from my nephew, so I'm not able to do a full test right now as plugging in my laptop is more energy than I have to spare. I did a quick test trying to load enwiki on my phone, over the IPv4 only connection and it was loading consistently fine. If I feel better tomorrow my time zone I'll rerun the tests from my email above and post the results here for comparison.

Hey.

I'm the user from Northern Ireland who sent the email cmooney copied above. I'm currently struck down with some sort of cold or sinus infection from my nephew, so I'm not able to do a full test right now as plugging in my laptop is more energy than I have to spare. I did a quick test trying to load enwiki on my phone, over the IPv4 only connection and it was loading consistently fine. If I feel better tomorrow my time zone I'll rerun the tests from my email above and post the results here for comparison.

Many thanks for the feedback @Sideswipe9th, especially given that you are not feeling so well :)

I'm glad to hear it sounds like the issue is no longer present. We get some user error reports via a side channel, which shows a large uptick in problems for users on BT UK / AS2856 starting Dec 22nd which ended on Dec 30th. So hopefully it was just some transient problem the BT engineers were able to resolve.

Let me take this opportunity to thank you for the report, and wish you both a happy new year and a speedy recovery.

Sorry it's taken a few more days to reply @cmooney , this cold kicked me harder than I thought. Feeling better now though.

Everything appears to be working as expected now on both connections. Pinging esams gives 0% packetloss, and ping times about what I'd expect for this connection. Results of the other tests mentioned above are in the codeblock below. If there's any other tests you want me to run let me know :)

tracert en.wikipedia.org

Tracing route to dyna.wikimedia.org [185.15.59.224]
over a maximum of 30 hops:

  1     1 ms     1 ms     1 ms  192.168.1.1
  2    61 ms     *       36 ms  11.1.6.254
  3     *        *        *     Request timed out.
  4    54 ms    40 ms    27 ms  213.121.52.205
  5     *        *        *     Request timed out.
  6    36 ms    37 ms    34 ms  109.249.132.16
  7    76 ms     *       37 ms  62.6.204.207
  8   107 ms    56 ms    37 ms  linx-2.init7.net [195.66.236.175]
  9    54 ms    47 ms    46 ms  linx-1.init7.net [195.66.224.175]
 10    74 ms    62 ms    45 ms  r2ams2.core.init7.net [5.180.135.240]
 11    43 ms   182 ms    53 ms  r1ams2.core.init7.net [5.180.135.234]
 12    42 ms     *       48 ms  gw-wikimedia.init7.net [77.109.134.114]
 13    47 ms    48 ms    46 ms  et-0-0-50.asw1-bw27-esams.wikimedia.org [185.15.59.159]
 14    53 ms    75 ms    43 ms  text-lb.esams.wikimedia.org [185.15.59.224]

Trace complete.

tracert text-lb.ulsfo.wikimedia.org

Tracing route to text-lb.ulsfo.wikimedia.org [198.35.26.96]
over a maximum of 30 hops:

  1     1 ms     1 ms     1 ms  192.168.1.1
  2    39 ms    23 ms    28 ms  11.1.6.254
  3     *        *        *     Request timed out.
  4    27 ms    39 ms    26 ms  213.121.52.205
  5     *        *        *     Request timed out.
  6    41 ms    37 ms    45 ms  109.249.132.14
  7    45 ms    40 ms    34 ms  62.6.204.193
  8    42 ms    49 ms    40 ms  166-49-214-194.gia.bt.net [166.49.214.194]
  9    46 ms    37 ms     *     212.119.4.136
 10     *        *       43 ms  ae-0.r21.londen12.uk.bb.gin.ntt.net [129.250.3.214]
 11    43 ms    48 ms    39 ms  ae-1.r20.londen12.uk.bb.gin.ntt.net [129.250.2.182]
 12   126 ms   108 ms   107 ms  ae-7.r20.nwrknj03.us.bb.gin.ntt.net [129.250.6.147]
 13   126 ms   339 ms   111 ms  ae-0.r21.nwrknj03.us.bb.gin.ntt.net [129.250.6.17]
 14   140 ms   126 ms   137 ms  ae-3.r22.chcgil09.us.bb.gin.ntt.net [129.250.2.166]
 15   125 ms   127 ms   125 ms  ae-1.r23.chcgil09.us.bb.gin.ntt.net [129.250.2.27]
 16   190 ms   185 ms   178 ms  ae-1.r24.snjsca04.us.bb.gin.ntt.net [129.250.5.17]
 17   188 ms   175 ms   192 ms  ae-1.a00.snfcca07.us.bb.gin.ntt.net [129.250.4.49]
 18   407 ms   180 ms   171 ms  xe-1-1-5-1.a00.snfcca07.us.ce.gin.ntt.net [129.250.204.6]
 19   171 ms   174 ms   184 ms  text-lb.ulsfo.wikimedia.org [198.35.26.96]

Trace complete.

curl -m 60 -v https://en.wikipedia.org/wiki/Main_Page
*   Trying 185.15.59.224:443...
* Connected to en.wikipedia.org (185.15.59.224) port 443
* schannel: disabled automatic use of client certificate
* ALPN: curl offers http/1.1
* ALPN: server accepted http/1.1
* using HTTP/1.1
> GET /wiki/Main_Page HTTP/1.1
> Host: en.wikipedia.org
> User-Agent: curl/8.4.0
> Accept: */*
>
* schannel: remote party requests renegotiation
* schannel: renegotiating SSL/TLS connection
* schannel: SSL/TLS connection renegotiated
* schannel: remote party requests renegotiation
* schannel: renegotiating SSL/TLS connection
* schannel: SSL/TLS connection renegotiated
* schannel: failed to decrypt data, need more data
< HTTP/1.1 200 OK
< date: Sun, 07 Jan 2024 21:54:11 GMT
< server: mw1372.eqiad.wmnet
< x-content-type-options: nosniff
< content-language: en
< accept-ch:
< vary: Accept-Encoding,Cookie
< last-modified: Sun, 07 Jan 2024 21:54:09 GMT
< content-type: text/html; charset=UTF-8
< age: 5411
< x-cache: cp3072 miss, cp3072 hit/18662
< x-cache-status: hit-front
[snip]
</body>
</html>* Connection #0 to host en.wikipedia.org left intact
cmooney claimed this task.

Great @Sideswipe9th thanks for the feedback.

Definitely was a strange one, glad you could shed a bit more light on it for us. I'll close this off for now but feel free to reach out if you notice anything odd again in future!