Page MenuHomePhabricator

2025 tracking task for Beta Cluster (deployment-prep) traffic overload protection (blocking unwanted crawlers)
Open, MediumPublic

Description

Beta Cluster is having many of the same issues as the rest of the content producing internet in that a variety of content harvesting systems discover it via either organic link following or explicit seeding. In response we are using IP range blocking as a crude shield and thinking about finding ways to reduce toil for maintainers and frustrations for trusted users. The workstream doesn't seem big enough quite yet to warrant a milestone or subproject, but having a longer lived tracking bug like this one to connect things does feel useful.

See also:

Related Objects

StatusSubtypeAssignedTask
OpenNone
ResolvedBUG REPORT bd808
OpenFeatureNone
ResolvedFeature bd808
OpenFeatureNone
Resolvedtaavi
ResolvedBUG REPORT bd808
ResolvedBUG REPORTdancy
Openssingh
ResolvedBUG REPORT bd808
DeclinedBUG REPORTNone
ResolvedBUG REPORT bd808
OpenNone
ResolvedBUG REPORTXqt
OpenFeatureNone
Resolved bd808
OpenFeatureNone

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Blocking by IP is a good idea for three days while exposed to a DDoS etc., but it is a very bad idea to keep people outside for months and years.

  • I have linked a list of UserAgents.
  • As long as they identify themselves reasonably, both toolforge.org​ and wmcloud.org should block all known crawlers. Rather than a wikipedia there is nothing to explore for search engines nor archives.

The IP based blocks have a conception of 1999, when a nasty kid is sitting in a roof chamber at a 56k dialup modem. If you block this connection that attack is off.

  • Modern attacks use bot nets, and the provider of the unwittingly hijacked end users.
  • Crawlers are not supposed to use a static IP over months and years. They might be connected within regular networks and cannot be distinguished by IP from regular humans.
  • However, white crawlers do identify themselves by user agent, as proofed above. They are out now, and it works again.

@PerfektesChaos I understand your arguments and your frustration. We do not have the people or the technology in Beta Cluster at the moment to provide active management of high touch blocking methods. We are barely keeping up with this in the production wikis where there are much larger number of people involved and more comprehensive data collection and reporting tooling that help find patterns like "all the bad traffic is using this strange user-agent". We are not talking about well behaved bots here; these wikis all have robots.txt files that tell all crawlers to ignore the sites. The wide IPv4 blocks are annoying for everyone, but they have been largely effective in keeping the Beta Cluster wikis functional in the face of distributed web crawlers that seem not to care if they knock the sites they are crawling offline via load or not.

bd808 closed subtask Restricted Task as Resolved.Jul 23 2025, 5:53 PM
bd808 changed the status of subtask Restricted Task from Open to In Progress.Jul 23 2025, 7:44 PM
bd808 changed the status of subtask Restricted Task from Open to In Progress.Jul 24 2025, 5:26 PM
bd808 closed subtask Restricted Task as Resolved.Jul 24 2025, 7:41 PM
bd808 changed the status of subtask Restricted Task from Open to In Progress.Jul 28 2025, 4:55 PM
bd808 closed subtask Restricted Task as Resolved.Jul 28 2025, 5:51 PM
bd808 closed subtask Restricted Task as Resolved.
bd808 changed the status of subtask Restricted Task from Open to In Progress.Jul 31 2025, 4:50 PM
bd808 closed subtask Restricted Task as Resolved.Jul 31 2025, 5:08 PM
bd808 closed subtask Restricted Task as Resolved.Jul 31 2025, 6:10 PM
Aklapper closed subtask Restricted Task as Declined.Aug 3 2025, 4:13 PM
bd808 changed the status of subtask Restricted Task from Open to In Progress.Aug 7 2025, 4:13 PM
bd808 closed subtask Restricted Task as Resolved.Aug 7 2025, 4:21 PM
bd808 reopened subtask Restricted Task as Open.Aug 7 2025, 4:34 PM
bd808 closed subtask Restricted Task as Resolved.Aug 7 2025, 4:40 PM
bd808 closed subtask Restricted Task as Resolved.Aug 12 2025, 3:41 PM
bd808 changed the status of subtask Restricted Task from Open to In Progress.Aug 14 2025, 3:27 PM
bd808 closed subtask Restricted Task as Resolved.Aug 14 2025, 3:37 PM
bd808 changed the status of subtask Restricted Task from Open to In Progress.Aug 15 2025, 4:43 PM
bd808 closed subtask Restricted Task as Resolved.Aug 15 2025, 4:53 PM

https://gerrit.wikimedia.org/r/c/operations/puppet/+/1175991 was merged as part of T396621: Requestctl should use x-provenance header. This Puppet change by @Joe has removed the blocking method that we have been using to manage unwanted traffic in Beta Cluster. I'm not sure what we can do now without help from SREs to deploy and configure their preferred blocking technology.

https://gerrit.wikimedia.org/r/c/operations/puppet/+/1175991 was merged as part of T396621: Requestctl should use x-provenance header. This Puppet change by @Joe has removed the blocking method that we have been using to manage unwanted traffic in Beta Cluster. I'm not sure what we can do now without help from SREs to deploy and configure their preferred blocking technology.

Looks like these all moved via: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1175989 and now all abuse_networks are in /etc/haproxy/ipblocks.d/all.map not that I quite grok what that means. AFAICT, it's not actively blocking anything in beta at the moment.

Looks like these all moved via: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1175989 and now all abuse_networks are in /etc/haproxy/ipblocks.d/all.map not that I quite grok what that means. AFAICT, it's not actively blocking anything in beta at the moment.

https://gerrit.wikimedia.org/r/plugins/gitiles/cloud/instance-puppet/+/4b0764b08663ba53152234f1bc5cc8f7b83db91d%5E%21/#F0

diff --git a/deployment-prep/deployment-cache.yaml b/deployment-prep/deployment-cache.yaml
index 0a1994b..d856af7 100644
--- a/deployment-prep/deployment-cache.yaml
+++ b/deployment-prep/deployment-cache.yaml

@@ -136,6 +136,7 @@
   keep_alive: 3
   server: 0
   tunnel: 0
+profile::cache::haproxy::set_x_provenance: true
 profile::cache::haproxy::timeout:
   client: 120
   client_fin: 120
$ sudo -i puppet agent -tv
Info: Using environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts
Info: Caching catalog for deployment-cache-text08.deployment-prep.eqiad1.wikimedia.cloud
Info: Applying configuration version '(39b7a663af) gitpuppet - varnish: Implement new direct routing for mobile views'
Notice: /Stage[main]/Prometheus::Varnishkafka_exporter/Service[prometheus-varnishkafka-exporter]/ensure: ensure changed 'stopped' to 'running' (corrective)
Info: /Stage[main]/Prometheus::Varnishkafka_exporter/Service[prometheus-varnishkafka-exporter]: Unscheduling refresh on Service[prometheus-varnishkafka-exporter]
Notice: /Stage[main]/Profile::Cache::Haproxy/Package[lua5.3-maxminddb]/ensure: created
Notice: /Stage[main]/Haproxy/File[/etc/haproxy/haproxy.cfg]/content:
--- /etc/haproxy/haproxy.cfg    2025-08-26 14:53:32.376260599 +0000
+++ /tmp/puppet-file20250826-2210500-18xjwaw    2025-08-26 22:49:18.635209927 +0000
@@ -10,6 +10,7 @@
     nbthread 4
     cpu-map 1/1- 0 1 2 3

+    lua-load-per-thread /etc/haproxy/lua/maxmind-lookup.lua

     ssl-default-bind-options ssl-min-ver TLSv1.2 ssl-max-ver TLSv1.3
     ssl-default-bind-ciphers -ALL:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-RSA-AES128-GCM-SHA256

Notice: /Stage[main]/Haproxy/File[/etc/haproxy/haproxy.cfg]/content: content changed '{sha256}661341f7682593551490bb9b215436ce82edb12b5a61e11a93940d60f15497fd' to '{sha256}130d203201a917ded8781cd381a3fcf61844984ec34bac284d41ed58396bd750'
Info: /Stage[main]/Haproxy/File[/etc/haproxy/haproxy.cfg]: Scheduling refresh of Service[haproxy]
Notice: /Stage[main]/Profile::Cache::Haproxy/File[/etc/haproxy/lua/maxmind-lookup.lua]/ensure: defined content as '{sha256}b48a7074c612bf70b4840b80d9c098ffb8ff62dd7c344c016f7993ff02c1686d'
Info: /Stage[main]/Profile::Cache::Haproxy/File[/etc/haproxy/lua/maxmind-lookup.lua]: Scheduling refresh of Service[haproxy]
Notice: /Stage[main]/Profile::Cache::Haproxy/Haproxy::Site[tls]/File[/etc/haproxy/conf.d/tls.cfg]/content:
--- /etc/haproxy/conf.d/tls.cfg 2025-08-26 14:53:32.492261151 +0000
+++ /tmp/puppet-file20250826-2210500-17noyf6    2025-08-26 22:49:18.755210641 +0000
@@ -123,8 +123,41 @@
     # Allow OPTIONS method only with Origin header
     http-request deny deny_status 405 hdr X-Cache %[var(txn.x_cache)] if { method OPTIONS } !{ hdr(Origin) -m found }

-    # fallback to validate wikimedia_trust if we are not using X-Provenance
-    http-request set-var(req.trusted_request) str(A) if wikimedia_trust
+    # Set x-provenance
+    # First, we check if the request comes from our ring of trust, or is internal. We set:
+    # * net=wikimedia-trust if the request is coming from the wikimedia-trust subnet
+    # * net=internal if the request is coming from a private network
+    # If neither condition is met, the value will be looked up a map containing all
+    # requestctl-defined ipblocks, resulting in:
+    # * abuse=<value> if the request is coming from a known abuser
+    # * client=<value> if the request is coming from a known client ipblock
+    # * cloud=<value> if the request is coming from a known cloud
+    # If all of the above fails, we look up the ip in maxmind to fetch an isp value
+    # * isp=<value> if the IP matches the isp lookup in maxmind
+    # If this also fails, we have no information on the request and we'll set
+    # * net=unknown
+    # We only set the variable once
+    # Set X-Trusted-Request
+    # Provide a trust score from A to F, currently solely based on the source of the request
+    # A for net=wikimedia_trust|internal
+    # F for abuse=
+    # E otherwise
+    http-request set-var(req.provenance,ifnotexists) str('net=wikimedia-trust') if wikimedia_trust
+    acl is_private_network src 10.0.0.0/8 172.16.0.0/12 192.168.0.0/16 127.0.0.0/8 ::1
+    http-request set-var(req.provenance,ifnotexists) str('net=internal') if is_private_network
+    # AWS Elastic IPs used by the Wikimedia Enterprise project reported in the following tasks over time:
+    # T255524 T294798 T370294
+    acl is_wme_client src 3.23.12.83/32 3.211.48.168/32 44.206.140.241/32 35.168.168.219/32 35.172.30.169/32 3.222.74.115/32
+    http-request set-var(req.provenance,ifnotexists) str('net=wme') if is_wme_client
+    http-request set-var(req.trusted_request) str(A) if { var(req.provenance) -m found }
+    # check if the IP is included in one of our ipblocks
+    http-request set-var(req.provenance,ifnotexists,ifnotempty) src,map_ip(/etc/haproxy/ipblocks.d/all.map)
+    http-request set-var(req.trusted_request,ifnotexists) str(F) if { var(req.provenance) -m beg "abuse=" }
+    # If everything else failed, find an isp in maxmind
+    http-request set-var(req.provenance,ifnotexists,ifnotempty) lua.fetch_isp,lower,bytes(0,64)
+    # lookup failed
+    http-request set-var(req.provenance,ifnotexists) str('net=unknown')
+    http-request set-header X-Provenance %[var(req.provenance)]
     acl is_trusted_request var(req.trusted_request) -m str A
     http-request set-var(req.bearer) http_auth_bearer
     http-request set-var(req.jwt_alg) var(req.bearer),jwt_header_query('$.alg')

Notice: /Stage[main]/Profile::Cache::Haproxy/Haproxy::Site[tls]/File[/etc/haproxy/conf.d/tls.cfg]/content: content changed '{sha256}1b7c18517e375d245bec427a53d77635990f71a1cbdd42161184189cf7ada07b' to '{sha256}241e40d0c7972feceb71d580ee7a8e6d5b68cc07b01f70fc7fe3d45d3f0ebe66'
Info: /Stage[main]/Profile::Cache::Haproxy/Haproxy::Site[tls]/File[/etc/haproxy/conf.d/tls.cfg]: Scheduling refresh of Service[haproxy]
Error: /Stage[main]/Haproxy/Systemd::Service[haproxy]/Service[haproxy]: Failed to call refresh: Systemd restart for haproxy failed!
journalctl log for haproxy:
-- Journal begins at Tue 2025-08-19 14:48:25 UTC, ends at Tue 2025-08-26 22:49:20 UTC. --
Aug 26 22:49:20 deployment-cache-text08 systemd[1]: Reloading HAProxy Load Balancer.
Aug 26 22:49:20 deployment-cache-text08 systemd[1]: haproxy.service: Control process exited, code=exited, status=1/FAILURE
Aug 26 22:49:20 deployment-cache-text08 systemd[1]: Reload failed for HAProxy Load Balancer.

Error: /Stage[main]/Haproxy/Systemd::Service[haproxy]/Service[haproxy]: Systemd restart for haproxy failed!
journalctl log for haproxy:
-- Journal begins at Tue 2025-08-19 14:48:25 UTC, ends at Tue 2025-08-26 22:49:20 UTC. --
Aug 26 22:49:20 deployment-cache-text08 systemd[1]: Reloading HAProxy Load Balancer.
Aug 26 22:49:20 deployment-cache-text08 systemd[1]: haproxy.service: Control process exited, code=exited, status=1/FAILURE
Aug 26 22:49:20 deployment-cache-text08 systemd[1]: Reload failed for HAProxy Load Balancer.

Info: Class[Haproxy]: Unscheduling all events on Class[Haproxy]
Info: Stage[main]: Unscheduling all events on Stage[main]
Notice: Applied catalog in 21.46 seconds
$ sudo -i puppet agent -tv
Info: Using environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts
Info: Caching catalog for deployment-cache-text08.deployment-prep.eqiad1.wikimedia.cloud
Info: Applying configuration version '(39b7a663af) gitpuppet - varnish: Implement new direct routing for mobile views'
Notice: /Stage[main]/Prometheus::Varnishkafka_exporter/Service[prometheus-varnishkafka-exporter]/ensure: ensure changed 'stopped' to 'running' (corrective)
Info: /Stage[main]/Prometheus::Varnishkafka_exporter/Service[prometheus-varnishkafka-exporter]: Unscheduling refresh on Service[prometheus-varnishkafka-exporter]
Notice: /Stage[main]/Profile::Cache::Haproxy/File[/etc/haproxy/ipblocks.d/all.map]/content:
--- /etc/haproxy/ipblocks.d/all.map     2025-08-26 22:52:15.820244254 +0000
+++ /tmp/puppet-file20250826-2214700-1t1r8dh    2025-08-26 22:55:24.269308480 +0000
@@ -396,7 +396,6 @@
 202.0.0.0/8      abuse=blocked_nets
 216.0.0.0/8      abuse=blocked_nets
 217.147.172.0/24      abuse=blocked_nets
-45.33.19.43/32      abuse=blocked_nets
 192.0.2.0/24      abuse=text_abuse_nets
 192.241.194.113/32      abuse=bot_blocked_nets
 192.241.194.113/32      abuse=bot_posts_blocked_nets

Notice: /Stage[main]/Profile::Cache::Haproxy/File[/etc/haproxy/ipblocks.d/all.map]/content: content changed '{sha256}6796fd78738249eac1a89572cf050fc94991562962a764f805e92a18962ed000' to '{sha256}b3586e5dddcf1329666cd2fce85ebcebe84e1d3c669bd7878d19a49e2aa9a0ad'
Notice: Applied catalog in 19.72 seconds

HAProxy is failing to start because /usr/share/GeoIP/GeoIP2-ISP.mmdb is missing on the Beta Cluster cache nodes. That is a paid MaxMind database. I think we will need to be able to disable that usage.

HAProxy is failing to start because /usr/share/GeoIP/GeoIP2-ISP.mmdb is missing on the Beta Cluster cache nodes. That is a paid MaxMind database. I think we will need to be able to disable that usage.

@Joe could you help us fix ^ that in beta? For now, I've ugly-hacked beta into working, but if we ever need to rebuild any of these beta servers, that will fail.


Here's the terrible thing I did on deployment-cache-text08 and deployment-cache-upload08 to get haproxy started in the interim:

  • deployment-cache-text08 had /usr/share/GeoIP/GeoIP2-Country.mmdb (unclear where that came from)
  • I copied deployment-cache-text08:/usr/share/GeoIP/GeoIP2-Country.mmdb/usr/share/GeoIP/GeoIP2-ISP.mmdb on both deployment-cache-text08 and deployment-cache-upload08

After that, I restarted haproxy.

I can confirm blocking is working again, so Beta is working again.

I'm sure that when we look up ISPs we're getting an error response from the haproxy lua, for now, I guess, that's fine.

bd808 closed subtask Restricted Task as Resolved.Aug 27 2025, 5:40 PM
bd808 changed the status of subtask Restricted Task from Open to In Progress.Aug 27 2025, 6:24 PM
bd808 closed subtask Restricted Task as Resolved.Aug 27 2025, 6:31 PM
bd808 changed the status of subtask Restricted Task from Open to In Progress.
bd808 closed subtask Restricted Task as Resolved.Aug 27 2025, 6:38 PM
Etonkovidova reopened subtask Restricted Task as Open.Aug 27 2025, 11:44 PM
Etonkovidova closed subtask Restricted Task as Resolved.Aug 28 2025, 1:41 AM
bd808 changed the status of subtask Restricted Task from Open to In Progress.Aug 29 2025, 3:48 PM
bd808 closed subtask Restricted Task as Resolved.Aug 29 2025, 4:04 PM
bd808 changed the status of subtask Restricted Task from Open to In Progress.Sep 2 2025, 4:51 PM
bd808 closed subtask Restricted Task as Resolved.Sep 2 2025, 4:57 PM
bd808 changed the status of subtask Restricted Task from Open to In Progress.Sep 2 2025, 11:53 PM
bd808 closed subtask Restricted Task as Resolved.Sep 2 2025, 11:57 PM
bd808 changed the status of subtask Restricted Task from Open to In Progress.Sep 8 2025, 8:10 PM
bd808 closed subtask Restricted Task as Resolved.Sep 8 2025, 8:56 PM
bd808 changed the status of subtask Restricted Task from Open to In Progress.Sep 8 2025, 9:22 PM
bd808 closed subtask Restricted Task as Resolved.Sep 8 2025, 9:27 PM
bd808 changed the status of subtask Restricted Task from Open to In Progress.Oct 6 2025, 10:42 PM
bd808 closed subtask Restricted Task as Resolved.Oct 6 2025, 10:48 PM
bd808 changed the status of subtask Restricted Task from Open to In Progress.Oct 6 2025, 10:57 PM
bd808 closed subtask Restricted Task as Resolved.Oct 6 2025, 11:02 PM
bd808 changed the status of subtask Restricted Task from Open to In Progress.Oct 6 2025, 11:09 PM
bd808 closed subtask Restricted Task as Resolved.Oct 6 2025, 11:13 PM
itsmoon reopened subtask Restricted Task as Open.Oct 7 2025, 4:50 PM
bd808 closed subtask Restricted Task as Resolved.Oct 7 2025, 6:24 PM
bd808 changed the status of subtask Restricted Task from Open to In Progress.Oct 7 2025, 7:56 PM
bd808 closed subtask Restricted Task as Resolved.Oct 7 2025, 8:00 PM
bd808 changed the status of subtask Restricted Task from Open to In Progress.Oct 14 2025, 5:35 PM
bd808 closed subtask Restricted Task as Resolved.Oct 14 2025, 5:38 PM
Aklapper changed the status of subtask Restricted Task from Open to Stalled.Oct 15 2025, 5:35 PM
JosefAnthony changed the status of subtask Restricted Task from Stalled to Open.Oct 16 2025, 8:16 AM
bd808 closed subtask Restricted Task as Invalid.Oct 16 2025, 10:21 PM
bd808 changed the status of subtask Restricted Task from Open to In Progress.Oct 20 2025, 3:56 PM
bd808 closed subtask Restricted Task as Resolved.Oct 20 2025, 4:00 PM
bd808 changed the status of subtask Restricted Task from Open to In Progress.Oct 22 2025, 7:51 PM
bd808 closed subtask Restricted Task as Resolved.Oct 22 2025, 7:54 PM
bd808 changed the status of subtask Restricted Task from Open to In Progress.Nov 3 2025, 4:06 PM
bd808 closed subtask Restricted Task as Resolved.Nov 3 2025, 4:10 PM
bd808 changed the status of subtask Restricted Task from Open to In Progress.Nov 6 2025, 1:30 AM
bd808 closed subtask Restricted Task as Resolved.Nov 6 2025, 1:33 AM
bd808 reopened subtask Restricted Task as Open.Nov 10 2025, 5:01 PM
bd808 closed subtask Restricted Task as Resolved.Nov 10 2025, 5:10 PM
bd808 changed the status of subtask Restricted Task from Open to In Progress.Nov 11 2025, 1:15 AM
bd808 closed subtask Restricted Task as Resolved.Nov 11 2025, 1:18 AM
Arian_Bozorg closed subtask Restricted Task as Declined.Nov 11 2025, 5:55 PM
bd808 changed the status of subtask Restricted Task from Open to In Progress.Wed, Nov 12, 9:38 PM
bd808 closed subtask Restricted Task as Resolved.Wed, Nov 12, 9:52 PM
bd808 changed the status of subtask Restricted Task from Open to In Progress.Mon, Nov 17, 5:51 PM
bd808 closed subtask Restricted Task as Resolved.Mon, Nov 17, 6:22 PM
bd808 changed the status of subtask Restricted Task from Open to In Progress.Thu, Nov 20, 7:09 PM
bd808 closed subtask Restricted Task as Resolved.Thu, Nov 20, 7:12 PM
bd808 changed the status of subtask Restricted Task from Open to In Progress.Mon, Nov 24, 3:58 PM
bd808 closed subtask Restricted Task as Resolved.Mon, Nov 24, 4:12 PM
bd808 closed subtask Restricted Task as Resolved.Mon, Nov 24, 9:31 PM
bd808 changed the status of subtask Restricted Task from Open to In Progress.Wed, Dec 10, 9:47 PM
bd808 closed subtask Restricted Task as Resolved.Wed, Dec 10, 9:54 PM