Wikimedia's User-Agent policy specifically forbids using generic values for the User-Agent request header.
Apply stricter rate limiting to requests violating the policy.
Wikimedia's User-Agent policy specifically forbids using generic values for the User-Agent request header.
Apply stricter rate limiting to requests violating the policy.
Change 514017 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] cache_upload: return HTTP 403 to requests violating UA policy
Change 514017 merged by Ema:
[operations/puppet@production] cache_upload: return HTTP 403 to requests violating UA policy
For Tech News: Bots and other scripts that do not set an identifiable User-Agent may find their requests blocked until they identify themselves properly.
Not sure if it applies here, but please remember that we allow Api-User-Agent as an alternative to User-Agent for Javascript solutions.
We (Traffic) have decided to continue allowing requests violating the UA policy. Instead of blocking them, we will apply stricter rate limiting to those.
Change 513596 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] varnish: cache_upload rate limit
Change 513596 merged by Ema:
[operations/puppet@production] varnish: cache_upload miss/pass rate limit
TechNews: I've added it to the upcoming edition with this edit, that will be frozen for translation in about 18 hours. Please amend it before then if needed. (And thank you @Legoktm for writing the initial version!). Cheers!
Even with the current rate limiting, some crawling are regularly causing issues, wasting precious SRE time.
I'd like to revisit this task to be more strict on user agents, maybe progressively increasing the way we enforce our policy. For example:
A variant could be to only apply the above on the upload cluster, but the less exceptions the better
Agreed to all that, though I would not exempt WMCS because WMCS can generate significant amounts of traffic much faster by virtue of already being in the cluster and people using WMCS are generally Wikimedians who should be more familiar with our policies than someone who just wants to scrape wiki pages.
I would also add that after a DoS ~2 months ago I spent a while working on advertising the UA policy and our general API usage guidelines: [1], [2].