While investigating T179156: 503 spikes and resulting API slowness starting 18:45 October 26 I was checking hadoop webrequests responses that are 500 and found https://search.wikimedia.org which amazingly 500s on the main page as well as most requests and this needs to be cleaned up so it doesn't spam web requests.
Description
Details
Related Objects
Event Timeline
This is apparently https://wikitech.wikimedia.org/wiki/Search.wikimedia.org and we still need to maintain it.
I took a sample of 10k failures (out of ~25k in last 30 days):
No http query string provided: 1.3%
No search term provided: 23.5%
Query too long (we limit to 255 characters): 51.6%
This leaves 22.4% which might have real errors. Not sure of the source yet, we don't have much of anything for logging on this endpoint. Will have to try and correlate these queries to the internal api requests it issues and any logs they might have issued.
It's for Apple's dictionary bridge. As best I know, no *recent* Apple products still use it. I'd be curious to see a breakdown by user agent...
Sadly yes we do :( T81982 has some more history here too.
I took a sample of 10k failures (out of ~25k in last 30 days):
No http query string provided: 1.3%
No search term provided: 23.5%
Query too long (we limit to 255 characters): 51.6%This leaves 22.4% which might have real errors. Not sure of the source yet, we don't have much of anything for logging on this endpoint. Will have to try and correlate these queries to the internal api requests it issues and any logs they might have issued.
I've long wondered if we should actually return a failure code here rather than a 200 with no content. We should sync up on this--for some bizarre reason I know way too much about the history here...
Pulled some info on overall usage and http response codes from webrequest logs. This is for oct 9 through nov 9 for all requests with host search.wikimedia.org.
http code | # requests |
500 | 25,147 |
503 | 107 |
- | 27 |
404 | 2,520 |
405 | 1 |
414 | 2,962 |
301 | 1,600,319 |
200 | 1,500,086 |
Do we also have breakdowns by site param? Right now we allow wikipedia, wiktionary, wikinews and wikisource. Do all 4 projects get results? Are these all exclusively for en projects too? I'm just thinking of ways we can start narrowing the API as much as possible.
Change 390347 had a related patch set uploaded (by Chad; owner: Chad):
[operations/mediawiki-config@master] search.wikimedia.org: simplify limit handling
If needed i can pull a full month, but will take longer. This is for nov 8th (UTC). This is also limited to requests that returned a 200 response code.
site | lang | count |
wikipedia | en | 39289 |
wikipedia | ja | 5940 |
wikipedia | fr | 1887 |
wikipedia | it | 1861 |
wikipedia | es | 1063 |
wikipedia | de | 823 |
wikipedia | ru | 507 |
wikipedia | pt | 322 |
wikipedia | nl | 294 |
wikipedia | zh | 199 |
wikipedia | no | 181 |
wikipedia | sv | 80 |
wikipedia | la | 55 |
wikipedia | da | 55 |
wikipedia | pl | 48 |
wikipedia | ko | 35 |
wikipedia | tr | 33 |
wikipedia | th | 32 |
wikipedia | ro | 24 |
wikipedia | fi | 24 |
wikipedia | el | 13 |
wikipedia | cs | 11 |
wikipedia | ar | 10 |
wikipedia | hu | 9 |
wikipedia | uk | 8 |
wikipedia | he | 8 |
wikipedia | sk | 7 |
wikipedia | hr | 7 |
wikipedia | ms | 6 |
wikipedia | is | 4 |
wikipedia | vi | 2 |
wikipedia | lt | 2 |
wikipedia | ca | 2 |
Hmmm, nobody but wikipedia? I'm really wondering if we can drop the other backends from here.
Change 390347 merged by jenkins-bot:
[operations/mediawiki-config@master] search.wikimedia.org: simplify limit handling
Reoponing as it's sill occurring, this time is because we return 500 for whatever non 200 code we receive from the backend.
See https://github.com/wikimedia/operations-mediawiki-config/blob/master/docroot/search.wikimedia.org/index.php#L62-L64
This code should be smarter and i.e. not return 500 when the backend returns a 400.
Removing myself as I unfortunately won't be able to help. Whomever is free next will take it.
Change 430502 had a related patch set uploaded (by EBernhardson; owner: EBernhardson):
[operations/mediawiki-config@master] Forward response codes >= 400 on search.wikimedia.org
Change 430502 merged by jenkins-bot:
[operations/mediawiki-config@master] Forward response codes >= 400 on search.wikimedia.org