Page MenuHomePhabricator

IAbot sending a huge volume of action=raw requests (HTTP 415 errors)
Closed, ResolvedPublic

Assigned To
Authored By
Tgr
Dec 11 2020, 5:57 AM
Referenced Files
F34617913: image.png
Aug 23 2021, 2:29 PM
F34156056: Screenshot_2021-03-12 Discover - Elastic.png
Mar 12 2021, 8:12 PM
F34148488: image.png
Mar 9 2021, 10:36 PM
F34147710: image.png
Mar 9 2021, 12:16 PM
F34147716: image.png
Mar 9 2021, 12:16 PM
F34138598: image.png
Mar 5 2021, 5:46 PM
F34138600: image.png
Mar 5 2021, 5:46 PM
F33942449: iabot-415.png
Dec 11 2020, 5:57 AM

Description

There has been a huge uptick in 415 responses from the Wikimedia appservers between 22:00 and midnight on Thursday:
https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?orgId=1&from=1607636985827&to=1607646220225

iabot-415.png (369×913 px, 62 KB)
iabot-total.png (392×922 px, 37 KB)
HTTP 415total requests

That's 1500 requests per second; about 15-20% of the total appserver traffic. The requests had the user agent IABot/2.0 and had URLs like https://www.wikidata.org/w/index.php?action=raw&title=Q14094323 and contained an Authorization header.

The 415 errors are from Wikibase which doesn't implement action=raw. Triggering 415 errors is not problematic in itself; the volume is.

There are several things wrong with that:

  • 1500/sec is a huge request volume, and well beyond the level where a bot operator should at least notify server opterators about what the bot is doing. This was noticed because the requests resulted in HTTP 415, but the same applies to successful requests (e.g. action=raw requests to wikitext pages) when done at this volume; it would just be harder for ops to pinpoint what's happening in that case.
  • The requests have an Authorization header, which prevents all caching and causes extra work for the servers (they need to verify the OAuth signature) even though there is zero need for authentication for äction=raw requests (ie. fetching the page source).
  • Even if there were no Authorization headers, those requests are still uncacheable. These requests should be made to a chacheable endpoint, although TBH I'm not sure what that would be. There is the MediaWiki page source REST API which seems more appropriate, but also its relatively new, so we should probably double-check that.
  • The bot should not create extra work by sending requests to pages with a content type it has no hope of processing. It should probably have a namespace filter and special handling for Q-items on Wikidata.

Related Objects

Event Timeline

Restricted Application added a subscriber: Cyberpower678. · View Herald Transcript

Tagging Platform Engineering to get feedback about the optimal way of getting the page source.

jbond triaged this task as Medium priority.Dec 11 2020, 2:23 PM

Thursday, as in yesterday? I’m not aware of anything that should have been running to create that massive level of requests.

Possibly a bug in some error handling code causing the bot to instantly retry?

We could sleep for a second before sending the 415...

We saw a bunch of these requests again today. The main problem is that making requests like https://www.wikidata.org/w/index.php?action=raw&title=Q14094323 will *always* return a HTTP 415 error, so making them is just useless. IABot should see that the page it's trying to get text for is a Wikidata item and then not use action=raw if that's the case.

This bug and T274090: IABot 301 POST requests are both semi-innocuous in that neither is currently causing IABot or the wikis to break, but both issues cause toil for the Wikimedia Foundation's SRE teams by placing anomalous data in our monitoring systems which then needs to be investigated. This is disruptive under the best of circumstances, and could lead to more impactful issues if such investigations are done during an otherwise unrelated production wiki traffic event. In that case it is possible that critical investigation time could be wasted rediscovering that this particular anomalous traffic is a known and benign issue.

We saw a bunch of these requests again today. The main problem is that making requests like https://www.wikidata.org/w/index.php?action=raw&title=Q14094323 will *always* return a HTTP 415 error, so making them is just useless. IABot should see that the page it's trying to get text for is a Wikidata item and then not use action=raw if that's the case.

The code that emits the action=raw request seems to be https://github.com/internetarchive/internetarchivebot/blob/4de87f31006375d119b34fc46887844c9b437710/app/src/Core/APII.php#L182-L218

2021-03-05T17:41:06	120602	10.64.0.60	proxy:unix:/run/php/fpm-www.sock|fcgi://localhost/415	230	GET	http://www.wikidata.org/w/index.php?action=raw&title=Q10350930	-	text/html	-172.16.0.242, 10.64.0.130	IABot/2.0 (+https://en.wikipedia.org/wiki/User:InternetArchiveBot) (InternetArchiveBot operated by Cyberpower678.  Please contact Cyberpower678 at https://en.wikipedia.org/wiki/User:Cyberpower678 for questions and concerns.)	-	-	-	-	10.64.0.60	17e910b9-5490-40ca-aa8e-51ca62c73c52	172.16.0.242

This is happening again, @Cyberpower678 is there anything we can do about this?

image.png (544×1 px, 148 KB)

image.png (620×1 px, 124 KB)

Hmmm...it’s interesting. Not sure what is happening. Is there a timeframe when this happen? Is it continuous?

Hmmm...it’s interesting. Not sure what is happening. Is there a timeframe when this happen? Is it continuous?

The timeframe is in the graph that was posted above.

And today as well, so we can say that this happens every day.

image.png (1×3 px, 334 KB)

@Cyberpower678 Please disable the bot until this issue is fixed

jijiki renamed this task from IAbot sending a huge volume of action=raw requests to IAbot sending a huge volume of action=raw requests (HTTP 415 errors).Mar 11 2021, 5:49 PM

Is this still happening? I noticed a task that was commented out in my crontab file, was not commented out in the actual crontab.

Is this still happening? I noticed a task that was commented out in my crontab file, was not commented out in the actual crontab.

When did you make this change? We last saw it at 19:31 UTC.

Screenshot_2021-03-12 Discover - Elastic.png (1×3 px, 499 KB)

[20:33]  <    CP678> legoktm: Okay, my bad it wasn't commented it was starred.  (*)  I commented it out (again).  Let's see if this fixes it, because it's a broken task at current and not vital.  I thought it was disabled long ago.
Legoktm lowered the priority of this task from Medium to Low.Jun 25 2021, 6:46 PM

@Cyberpower678 it looks much better now, so thanks for that, however I still see a few requests where it is making action=raw requests for Flow boards, e.g.:

Can your code check the content model is wikitext (or not flow-board) before trying to get the text?

A month have passed. If it continues to happen. I suggest banning the user.

The offending task should have long been disabled. Is this still even happening?

image.png (92×1 px, 33 KB)

This is the crontab snippet. books.php was the offending program and is clearly disabled.

@Cyberpower678 it looks much better now, so thanks for that, however I still see a few requests where it is making action=raw requests for Flow boards, e.g.:

Can your code check the content model is wikitext (or not flow-board) before trying to get the text?

IABot inherently checks the pages content by using action=raw for most requests it does, but not at a high volume. An update in the near future will migrate to action=parse on the API.

I don't see any incidents of this in the last 30 days, happy to call this resolved.