Page MenuHomePhabricator

2025-07-11 traffic overload
Closed, ResolvedPublicBUG REPORT

Description

Web crawler traffic pushed load averages on deployment-mediawiki14 over 40 as things were recovering from T399281: 2025-07-11 Ceph issues causing Toolforge and Cloud VPS failures.

Screenshot 2025-07-11 at 10.47.04.png (1×2 px, 254 KB)

Note: The spike is tailing off in the graph because I made this task after having performed most of the blocks.

Event Timeline

bd808 changed the task status from Open to In Progress.Jul 11 2025, 4:48 PM
bd808 triaged this task as High priority.
bd808 changed the subtype of this task from "Task" to "Bug Report".

https://gerrit.wikimedia.org/r/plugins/gitiles/cloud/instance-puppet/+/f45bc3ef01fcc1afc18343694a188b1656814d6b%5E%21/#F0

diff --git a/deployment-prep/_.yaml b/deployment-prep/_.yaml
index b5f3398..98bcfdb 100644
--- a/deployment-prep/_.yaml
+++ b/deployment-prep/_.yaml

@@ -55,6 +55,7 @@
     - 41.92.0.0/14
     - 41.96.0.0/11
     - 41.128.0.0/9
+    - 43.0.0.0/8
     - 45.0.0.0/9
     - 45.128.0.0/12
     - 45.144.0.0/14
@@ -108,10 +109,13 @@
     - 47.224.0.0/12
     - 47.240.0.0/12
     - 52.167.144.0/24
+    - 85.0.0.0/8
+    - 89.0.0.0/8
     - 91.211.90.0/24
     - 102.0.0.0/8
     - 103.0.0.0/8
     - 105.0.0.0/8
+    - 124.0.0.0/8
     - 128.241.0.0/16
     - 131.0.0.0/8
     - 138.0.0.0/8
@@ -131,7 +135,10 @@
     - 152.60.0.0/14
     - 152.64.0.0/10
     - 152.128.0.0/9
+    - 159.0.0.0/8
     - 160.0.0.0/8
+    - 162.0.0.0/8
+    - 166.0.0.0/8
     - 167.0.0.0/8
     - 168.0.0.0/8
     - 170.0.0.0/8

https://gerrit.wikimedia.org/r/plugins/gitiles/cloud/instance-puppet/+/35f2519dbe1615a4ca25f2966816776abdb80aeb%5E%21/#F0

diff --git a/deployment-prep/_.yaml b/deployment-prep/_.yaml
index 98bcfdb..b4a739f 100644
--- a/deployment-prep/_.yaml
+++ b/deployment-prep/_.yaml

@@ -112,9 +112,11 @@
     - 85.0.0.0/8
     - 89.0.0.0/8
     - 91.211.90.0/24
+    - 101.0.0.0/8
     - 102.0.0.0/8
     - 103.0.0.0/8
     - 105.0.0.0/8
+    - 111.0.0.0/8
     - 124.0.0.0/8
     - 128.241.0.0/16
     - 131.0.0.0/8

https://gerrit.wikimedia.org/r/plugins/gitiles/cloud/instance-puppet/+/3d020ad39f4fd872b1d6118907049340df4593e0%5E%21/#F0

diff --git a/deployment-prep/_.yaml b/deployment-prep/_.yaml
index b4a739f..3b533a1 100644
--- a/deployment-prep/_.yaml
+++ b/deployment-prep/_.yaml

@@ -112,11 +112,16 @@
     - 85.0.0.0/8
     - 89.0.0.0/8
     - 91.211.90.0/24
+    - 94.0.0.0/8
     - 101.0.0.0/8
     - 102.0.0.0/8
     - 103.0.0.0/8
     - 105.0.0.0/8
+    - 110.0.0.0/8
     - 111.0.0.0/8
+    - 113.0.0.0/8
+    - 119.0.0.0/8
+    - 122.0.0.0/8
     - 124.0.0.0/8
     - 128.241.0.0/16
     - 131.0.0.0/8
@@ -144,6 +149,7 @@
     - 167.0.0.0/8
     - 168.0.0.0/8
     - 170.0.0.0/8
+    - 172.0.0.0/8
     - 177.0.0.0/8
     - 179.0.0.0/9
     - 179.128.0.0/15

Mentioned in SAL (#wikimedia-releng) [2025-07-11T16:52:55Z] <bd808> Reboot deployment-mediawiki14 to clear all open connections (T399329)

Things look stable for now. I'm back to having blocked way, way too much of the internet to get here. :/

bd808 reopened this task as In Progress.Jul 11 2025, 8:41 PM

Well that period of 1-2 load average didn't last long. :/

Screenshot 2025-07-11 at 14.40.00.png (1×2 px, 367 KB)

root@deployment-mediawiki14:~# ./big-ban-hammer.sh
    - 5.0.0.0/8         # 1142 hits
    - 13.0.0.0/8        # 1724 hits
    - 17.0.0.0/8        # 2032 hits
    - 27.0.0.0/8        # 1124 hits
    - 40.0.0.0/8        # 1174 hits
    - 43.0.0.0/8        # 2727 hits
    - 66.0.0.0/8        # 1140 hits
    - 69.0.0.0/8        # 1010 hits
    - 85.0.0.0/8        # 5228 hits
    - 89.0.0.0/8        # 3100 hits
    - 91.0.0.0/8        # 1967 hits
    - 94.0.0.0/8        # 2296 hits
    - 101.0.0.0/8       # 4596 hits
    - 110.0.0.0/8       # 1647 hits
    - 111.0.0.0/8       # 3939 hits
    - 113.0.0.0/8       # 1370 hits
    - 119.0.0.0/8       # 2266 hits
    - 122.0.0.0/8       # 2461 hits
    - 123.0.0.0/8       # 2156 hits
    - 124.0.0.0/8       # 4284 hits
    - 150.0.0.0/8       # 1093 hits
    - 159.0.0.0/8       # 3569 hits
    - 162.0.0.0/8       # 1489 hits
    - 166.0.0.0/8       # 1757 hits
    - 202.0.0.0/8       # 1386 hits
    - 216.0.0.0/8       # 1022 hits

Some of these were blocked earlier today and just haven't fallen off the report yet.

https://gerrit.wikimedia.org/r/plugins/gitiles/cloud/instance-puppet/+/4196570d51863215f793255d1f9397abf80bed7a%5E%21/#F0

diff --git a/deployment-prep/_.yaml b/deployment-prep/_.yaml
index 3b533a1..0a2c377 100644
--- a/deployment-prep/_.yaml
+++ b/deployment-prep/_.yaml

@@ -1,6 +1,8 @@
 abuse_networks:
   blocked_nets:
     networks:
+    - 5.0.0.0/8
+    - 13.0.0.0/8
     - 14.0.0.0/9
     - 14.128.0.0/11
     - 14.160.0.0/12
@@ -13,6 +15,8 @@
     - 14.191.192.0/20
     - 14.191.224.0/19
     - 14.192.0.0/10
+    - 17.0.0.0/8
+    - 27.0.0.0/8
     - 37.0.0.0/9
     - 37.128.0.0/11
     - 37.160.0.0/12
@@ -38,7 +42,7 @@
     - 38.243.0.0/16
     - 38.244.0.0/14
     - 38.248.0.0/13
-    - 40.77.167.0/24
+    - 40.0.0.0/8
     - 41.0.0.0/10
     - 41.64.0.0/12
     - 41.80.0.0/13
@@ -109,9 +113,11 @@
     - 47.224.0.0/12
     - 47.240.0.0/12
     - 52.167.144.0/24
+    - 66.0.0.0/8
+    - 69.0.0.0/8
     - 85.0.0.0/8
     - 89.0.0.0/8
-    - 91.211.90.0/24
+    - 91.0.0.0/
     - 94.0.0.0/8
     - 101.0.0.0/8
     - 102.0.0.0/8
@@ -122,6 +128,7 @@
     - 113.0.0.0/8
     - 119.0.0.0/8
     - 122.0.0.0/8
+    - 123.0.0.0/8
     - 124.0.0.0/8
     - 128.241.0.0/16
     - 131.0.0.0/8
@@ -136,6 +143,7 @@
     - 143.64.0.0/10
     - 143.128.0.0/9
     - 146.174.160.0/19
+    - 150.0.0.0/8
     - 152.0.0.0/11
     - 152.32.0.0/12
     - 152.48.0.0/13
@@ -270,7 +278,8 @@
     - 197.240.0.0/12
     - 200.0.0.0/8
     - 201.0.0.0/8
-    - 202.76.160.0/20
+    - 202.0.0.0/8
+    - 216.0.0.0/8
     - 217.147.172.0/24
 acmechief_host: deployment-acme-chief05.deployment-prep.eqiad1.wikimedia.cloud
 apt::use_experimental: true

https://gerrit.wikimedia.org/r/plugins/gitiles/cloud/instance-puppet/+/e31663ec8ee8a4db60a5f8088fca632c6d7f0c67%5E%21/#F0

diff --git a/deployment-prep/_.yaml b/deployment-prep/_.yaml
index 0a2c377..91e5f5b 100644
--- a/deployment-prep/_.yaml
+++ b/deployment-prep/_.yaml

@@ -117,7 +117,7 @@
     - 69.0.0.0/8
     - 85.0.0.0/8
     - 89.0.0.0/8
-    - 91.0.0.0/
+    - 91.0.0.0/8
     - 94.0.0.0/8
     - 101.0.0.0/8
     - 102.0.0.0/8

Mentioned in SAL (#wikimedia-releng) [2025-07-11T20:55:04Z] <bd808> blocked even more wide IP ranges in an attempt to get the load on deployment-mediawiki14 consistently below 3. (T399329)

https://gerrit.wikimedia.org/r/plugins/gitiles/cloud/instance-puppet/+/ff29b596475e04e2647d6c9ed4468c704f30ff0d%5E%21/#F0

diff --git a/deployment-prep/_.yaml b/deployment-prep/_.yaml
index 91e5f5b..be574e5 100644
--- a/deployment-prep/_.yaml
+++ b/deployment-prep/_.yaml

@@ -114,7 +114,6 @@
     - 47.240.0.0/12
     - 52.167.144.0/24
     - 66.0.0.0/8
-    - 69.0.0.0/8
     - 85.0.0.0/8
     - 89.0.0.0/8
     - 91.0.0.0/8

Unblocking myself :)

Mentioned in SAL (#wikimedia-releng) [2025-07-11T20:59:32Z] <bd808> Reboted deployment-mediawiki14 to clear active load (T399329)

We found in T399349: Login broken by memcached ferm rules being bypassed by hiera configuration that memcached was firewalled off from everything which probably made this all much. much worse than it otherwise would have been. MediaWiki really needs caching to work at scale.