Page MenuHomePhabricator

lists.wikimedia.org was unavailable
Closed, ResolvedPublicSecurity

Description

lists.wikimedia.org was unavailable for a period of time around 14:45 UTC on May 16.

There was an increase in CPU and memory usage starting around 11:30.

CPU:

image.png (264×654 px, 32 KB)

Memory:

image.png (259×653 px, 35 KB)

As a follow up we should check why there was no page / IRC message / Phab task.

Event Timeline

LSobanski set Security to Software security bug.May 16 2025, 3:22 PM
LSobanski added projects: Security, Security-Team.
LSobanski changed the visibility from "Public (No Login Required)" to "Custom Policy".
LSobanski changed the subtype of this task from "Task" to "Security Issue".

Looks like a scraper

I've manually dropped the abuser: sudo nft insert rule inet base input ip saddr 93.123.109.83 drop it was a top hitter in the logs, but there might be more.

Puppet was disabled on lists1004 to prevent rolling back the nftables change.

sbassett changed the task status from Open to In Progress.May 19 2025, 4:25 PM
sbassett triaged this task as Medium priority.
sbassett edited projects, added SecTeam-Processed, Vuln-DoS; removed Security-Team.
sbassett changed Author Affiliation from N/A to WMF Technology.
sbassett changed Risk Rating from N/A to Medium.

I created and then merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/1148432

This added generic throttling (not based on allow/deny lists) on lists servers.

The default values apply, so port is 443 and 32 parallel connections are allowed or they get banned for 300 seconds.

This is now active on lists2001 (fail-over machine) but not yet on lists1004 (active machine) because puppet is disabled there still with @ABran-WMF 's message.

If there are no concerns we can enable puppet again and it will also be applied on lists1004.

Regarding additional throttling with actual lists of bad IPs/networks.. there are 2 more changes pending / to be discussed:

https://gerrit.wikimedia.org/r/c/operations/puppet/+/1148826 by Arnaudb which makes throttling more generic and https://gerrit.wikimedia.org/r/c/operations/puppet/+/1148433/1 by myself which copied code over from Gerrit without making it generic.

I created and then merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/1148432

This added generic throttling (not based on allow/deny lists) on lists servers.

The default values apply, so port is 443 and 32 parallel connections are allowed or they get banned for 300 seconds.

Thanks for enabling throttling on the lists machine! The default policy is accept, so no throttling is taking place:

ip saddr @DENYLIST accept
ip6 saddr @DENYLIST_V6 accept

If there are no concerns we can enable puppet again and it will also be applied on lists1004.

Regarding additional throttling with actual lists of bad IPs/networks.. there are 2 more changes pending / to be discussed:

https://gerrit.wikimedia.org/r/c/operations/puppet/+/1148826 by Arnaudb which makes throttling more generic and https://gerrit.wikimedia.org/r/c/operations/puppet/+/1148433/1 by myself which copied code over from Gerrit without making it generic.

We will enable generic throttling and add the single IP to the abusers list for lists host. Arnaud is preparing a patch for that.

puppet has been reenabled, the patch has been applied on lists1004, everything seems to be running OK

sbassett changed the visibility from "Custom Policy" to "Public (No Login Required)".May 27 2025, 3:37 PM

Change #1148433 abandoned by Dzahn:

[operations/puppet@production] lists: add parameter and code to block abusers using nftables

Reason:

another solution that uses the same list of abusers for multiple services by Arnaudb has been implemented.

https://gerrit.wikimedia.org/r/1148433