There has been a huge uptick in 415 responses from the Wikimedia appservers between 22:00 and midnight on Thursday:
https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?orgId=1&from=1607636985827&to=1607646220225
HTTP 415 | total requests |
That's 1500 requests per second; about 15-20% of the total appserver traffic. The requests had the user agent IABot/2.0 and had URLs like https://www.wikidata.org/w/index.php?action=raw&title=Q14094323 and contained an Authorization header.
The 415 errors are from Wikibase which doesn't implement action=raw. Triggering 415 errors is not problematic in itself; the volume is.
There are several things wrong with that:
- 1500/sec is a huge request volume, and well beyond the level where a bot operator should at least notify server opterators about what the bot is doing. This was noticed because the requests resulted in HTTP 415, but the same applies to successful requests (e.g. action=raw requests to wikitext pages) when done at this volume; it would just be harder for ops to pinpoint what's happening in that case.
- The requests have an Authorization header, which prevents all caching and causes extra work for the servers (they need to verify the OAuth signature) even though there is zero need for authentication for äction=raw requests (ie. fetching the page source).
- Even if there were no Authorization headers, those requests are still uncacheable. These requests should be made to a chacheable endpoint, although TBH I'm not sure what that would be. There is the MediaWiki page source REST API which seems more appropriate, but also its relatively new, so we should probably double-check that.
- The bot should not create extra work by sending requests to pages with a content type it has no hope of processing. It should probably have a namespace filter and special handling for Q-items on Wikidata.