[17:24] < MatmaRex> is beta cluster having a bad time, or is it just me? [17:24] < MatmaRex> pages are taking forever to load [17:24] < bd808> beta.wmflabs.org seems slow/stuck from my laptop [17:26] < bd808> there are some tall spikes on the aggregated load graph for the Cloud VPS project in total [17:27] < bd808> I finally got a timeout from deployment-cache-text08 trying to get data from the backing MediaWiki. [17:30] < bd808> load on deployment-mediawiki14 is ~6. https://grafana.wmcloud.org/d/0g9N-7pVz/cloud-vps-project-board?orgId=1&var-project=deployment-prep&var-instance=All&from=now-2d&to=now&viewPanel=902 [17:30] < bd808> !log `shutdown -r now` on deployment-mediawiki14. Load has been growing for ~2 days. [17:30] < stashbot> Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [17:31] < bd808> MatmaRex: maybe better for a bit. ¯\_(ツ)_/¯ [17:32] < bd808> its popping right back up though. I need to find hashar's notes on looking at the inbound traffic. [17:32] < MatmaRex> huh, interesting [17:34] < bd808> yeah, it's right back up to 6 again with 9 parallel php processes at the top of the %CPU [17:35] < MatmaRex> so, scraping, surely? [17:36] < bd808> zuul is very quiet, so yeah I would guess some bots being agressive
Description
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Open | None | T393487 2025 tracking task for Beta Cluster (deployment-prep) traffic overload protection (blocking unwanted crawlers) | |||
| Resolved | BUG REPORT | • bd808 | T392003 High load on deployment-mediawiki14 and slow responses |
Event Timeline
{T389181} was a prior round of overload (sorry, private task because of lots of IPv4 addresses in it)
The top 10 clients per grep -oP '"X-Client-IP": "\d+\.\d+\.\d+\.\d+' /var/log/apache2/other_vhosts_access-json.log|sort|uniq -c|sort -nr|head -n10 on deployment-mediawiki14 are coming from large IPv4 allocations registered to Microsoft. I assume these are Azure addresses. I am going to block the CIDRs connected to these clients at our varnish layer via Horizon managed hiera.
diff --git a/deployment-prep/_.yaml b/deployment-prep/_.yaml index f96f58f..acc36ea 100644 --- a/deployment-prep/_.yaml +++ b/deployment-prep/_.yaml @@ -10,6 +10,19 @@ - 47.80.0.0/13 - 47.74.0.0/15 - 47.76.0.0/14 + - 52.152.0.0/13 + - 52.160.0.0/11 + - 52.145.0.0/16 + - 52.148.0.0/14 + - 52.146.0.0/15 + - 40.112.0.0/13 + - 40.76.0.0/14 + - 40.120.0.0/14 + - 40.125.0.0/17 + - 40.124.0.0/16 + - 40.74.0.0/15 + - 40.96.0.0/12 + - 40.80.0.0/12 acmechief_host: deployment-acme-chief05.deployment-prep.eqiad1.wikimedia.cloud apt::use_experimental: true aptly::group: wikidev
diff --git a/deployment-prep/_.yaml b/deployment-prep/_.yaml index acc36ea..212ff33 100644 --- a/deployment-prep/_.yaml +++ b/deployment-prep/_.yaml @@ -23,6 +23,10 @@ - 40.74.0.0/15 - 40.96.0.0/12 - 40.80.0.0/12 + - 13.64.0.0/11 + - 13.104.0.0/14 + - 13.96.0.0/13 + - 23.96.0.0/13 acmechief_host: deployment-acme-chief05.deployment-prep.eqiad1.wikimedia.cloud apt::use_experimental: true aptly::group: wikidev
Mentioned in SAL (#wikimedia-releng) [2025-04-15T18:06:15Z] <bd808> sudo puppet agent -tv on deployment-cache-text08 to update varnish deny list (T392003)
Mentioned in SAL (#wikimedia-releng) [2025-04-15T18:11:26Z] <bd808> bd808@deployment-cache-text08:~$ sudo service varnish-frontend restart (T392003)
diff --git a/deployment-prep/_.yaml b/deployment-prep/_.yaml index 212ff33..8a6f878 100644 --- a/deployment-prep/_.yaml +++ b/deployment-prep/_.yaml @@ -27,6 +27,17 @@ - 13.104.0.0/14 - 13.96.0.0/13 - 23.96.0.0/13 + - 45.89.148.0/23 + - 45.93.184.0/23 + - 91.124.117.0/24 + - 140.228.23.0/24 + - 146.104.0.0/14 + - 146.110.0.0/16 + - 146.100.0.0/14 + - 146.108.0.0/15 + - 96.62.0.0/16 + - 154.16.246.0/24 + - 102.129.130.0/24 acmechief_host: deployment-acme-chief05.deployment-prep.eqiad1.wikimedia.cloud apt::use_experimental: true aptly::group: wikidev
The intertubes are full of bots and they all want that juicy betawiki content I guess. :/
Mentioned in SAL (#wikimedia-releng) [2025-04-15T19:40:10Z] <bd808> Forced puppet run and restarted varnish on deployment-cache-text08 to pick up new blocks (T392003)
The last hiera change I made for this is https://gerrit.wikimedia.org/r/plugins/gitiles/cloud/instance-puppet/+/8697c499f4c8e91428329ea98b4e519de03e3507%5E%21/#F0. I was just sorting the blocked_nets list with sort n in vim.
diff --git a/deployment-prep/_.yaml b/deployment-prep/_.yaml index 8a6f878..8c432a1 100644 --- a/deployment-prep/_.yaml +++ b/deployment-prep/_.yaml @@ -2,42 +2,42 @@ blocked_nets: networks: - 8.208.0.0/12 - - 47.240.0.0/14 - - 47.244.0.0/15 - - 47.236.0.0/14 - - 47.246.0.0/16 - - 47.235.0.0/16 - - 47.80.0.0/13 - - 47.74.0.0/15 - - 47.76.0.0/14 - - 52.152.0.0/13 - - 52.160.0.0/11 - - 52.145.0.0/16 - - 52.148.0.0/14 - - 52.146.0.0/15 - - 40.112.0.0/13 - - 40.76.0.0/14 - - 40.120.0.0/14 - - 40.125.0.0/17 - - 40.124.0.0/16 - - 40.74.0.0/15 - - 40.96.0.0/12 - - 40.80.0.0/12 - - 13.64.0.0/11 - 13.104.0.0/14 + - 13.64.0.0/11 - 13.96.0.0/13 - 23.96.0.0/13 + - 40.112.0.0/13 + - 40.120.0.0/14 + - 40.124.0.0/16 + - 40.125.0.0/17 + - 40.74.0.0/15 + - 40.76.0.0/14 + - 40.80.0.0/12 + - 40.96.0.0/12 - 45.89.148.0/23 - 45.93.184.0/23 + - 47.235.0.0/16 + - 47.236.0.0/14 + - 47.240.0.0/14 + - 47.244.0.0/15 + - 47.246.0.0/16 + - 47.74.0.0/15 + - 47.76.0.0/14 + - 47.80.0.0/13 + - 52.145.0.0/16 + - 52.146.0.0/15 + - 52.148.0.0/14 + - 52.152.0.0/13 + - 52.160.0.0/11 - 91.124.117.0/24 - - 140.228.23.0/24 - - 146.104.0.0/14 - - 146.110.0.0/16 - - 146.100.0.0/14 - - 146.108.0.0/15 - 96.62.0.0/16 - - 154.16.246.0/24 - 102.129.130.0/24 + - 140.228.23.0/24 + - 146.100.0.0/14 + - 146.104.0.0/14 + - 146.108.0.0/15 + - 146.110.0.0/16 + - 154.16.246.0/24 acmechief_host: deployment-acme-chief05.deployment-prep.eqiad1.wikimedia.cloud apt::use_experimental: true aptly::group: wikidev
I told folks about all this blocking: https://lists.wikimedia.org/hyperkitty/list/wikitech-l@lists.wikimedia.org/thread/2RFFHXSI6INQDJ2AQ7U3IQ2HTHT5J4VL/