Page MenuHomePhabricator

Add new Tool Labs IPs to Varnish rate limit whitelist
Closed, DuplicatePublic

Description

Varnish is returning error 429 to certain tools which contact the MediaWiki API for Wikimedia projects and at least one English Wikipedia bot ended up misbehaving and being blocked: https://en.wikipedia.org/?oldid=879480043#Cyberbot

https://phabricator.wikimedia.org/source/operations-puppet/browse/production/modules/varnish/templates/text-frontend.inc.vcl.erb$262

16.54 < Krenair> modules/varnish/templates/vcl/wikimedia-common.inc.vcl.erb does acl wikimedia_nets {
16.54 < Krenair> <% scope.lookupvar('::network::constants::aggregate_networks').each do |entry|
16.54 < Krenair> which comes from modules/network/data/data.yaml
16.55 < Krenair> network::aggregate_networks for production does not include 172.16.0.0/12
16.56 < Krenair> just the old 10/8 range

Event Timeline

Nemo_bis created this task.Jan 21 2019, 4:15 PM
Restricted Application added a project: Operations. · View Herald TranscriptJan 21 2019, 4:15 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

I created this more specific task for Tools as requested, but there is a (more general?) Labs task at T213475

Nemo_bis triaged this task as High priority.Jan 21 2019, 4:26 PM

Tools cannot be done separately, it does not have an IP space of it's own, tools instances are scattered around the same network as instances from other projects.

faidon added a subscriber: faidon.Jan 21 2019, 8:07 PM

Per our earlier conversations (T208986, T174596, T209011), I think we should just use the WMCS public IP space to make these kind of exceptions (which also could be dedicated for Toolforge), and not make rate-limit exceptions on 172.16.0.0/12 space.

IMHO between all those "short-term" fixes we're just getting deeper into tech debt for everyone, and we should just cut our losses short and just do the right thing here :)

Per our earlier conversations (T208986, T174596, T209011), I think we should just use the WMCS public IP space to make these kind of exceptions (which also could be dedicated for Toolforge), and not make rate-limit exceptions on 172.16.0.0/12 space.
IMHO between all those "short-term" fixes we're just getting deeper into tech debt for everyone, and we should just cut our losses short and just do the right thing here :)

What is the right thing. As it stands right now, Cyberbot is a standard bot making a standard number of requests, like always, but is getting blocked 60% of the time. It's no where near reached the rate limit, yet it's getting hit by a front-end one which is likely happening for other Cloud based tools and bots.