Page MenuHomePhabricator

search.wikimedia.org is source of lots of 500s
Closed, ResolvedPublic

Description

While investigating T179156: 503 spikes and resulting API slowness starting 18:45 October 26 I was checking hadoop webrequests responses that are 500 and found https://search.wikimedia.org which amazingly 500s on the main page as well as most requests and this needs to be cleaned up so it doesn't spam web requests.

Event Timeline

debt triaged this task as Low priority.Nov 9 2017, 6:13 PM
debt moved this task from needs triage to Up Next on the Discovery-Search board.
debt added a subscriber: debt.

Let's take a look and fix this, then figure out who uses it and why.

This is apparently https://wikitech.wikimedia.org/wiki/Search.wikimedia.org and we still need to maintain it.

I took a sample of 10k failures (out of ~25k in last 30 days):

No http query string provided: 1.3%
No search term provided: 23.5%
Query too long (we limit to 255 characters): 51.6%

This leaves 22.4% which might have real errors. Not sure of the source yet, we don't have much of anything for logging on this endpoint. Will have to try and correlate these queries to the internal api requests it issues and any logs they might have issued.

Let's take a look and fix this, then figure out who uses it and why.

It's for Apple's dictionary bridge. As best I know, no *recent* Apple products still use it. I'd be curious to see a breakdown by user agent...

This is apparently https://wikitech.wikimedia.org/wiki/Search.wikimedia.org and we still need to maintain it.

Sadly yes we do :( T81982 has some more history here too.

I took a sample of 10k failures (out of ~25k in last 30 days):

No http query string provided: 1.3%
No search term provided: 23.5%
Query too long (we limit to 255 characters): 51.6%

This leaves 22.4% which might have real errors. Not sure of the source yet, we don't have much of anything for logging on this endpoint. Will have to try and correlate these queries to the internal api requests it issues and any logs they might have issued.

I've long wondered if we should actually return a failure code here rather than a 200 with no content. We should sync up on this--for some bizarre reason I know way too much about the history here...

Pulled some info on overall usage and http response codes from webrequest logs. This is for oct 9 through nov 9 for all requests with host search.wikimedia.org.

http code# requests
50025,147
503107
-27
4042,520
4051
4142,962
3011,600,319
2001,500,086

Do we also have breakdowns by site param? Right now we allow wikipedia, wiktionary, wikinews and wikisource. Do all 4 projects get results? Are these all exclusively for en projects too? I'm just thinking of ways we can start narrowing the API as much as possible.

Change 390347 had a related patch set uploaded (by Chad; owner: Chad):
[operations/mediawiki-config@master] search.wikimedia.org: simplify limit handling

https://gerrit.wikimedia.org/r/390347

If needed i can pull a full month, but will take longer. This is for nov 8th (UTC). This is also limited to requests that returned a 200 response code.

sitelangcount
wikipediaen39289
wikipediaja5940
wikipediafr1887
wikipediait1861
wikipediaes1063
wikipediade823
wikipediaru507
wikipediapt322
wikipedianl294
wikipediazh199
wikipediano181
wikipediasv80
wikipediala55
wikipediada55
wikipediapl48
wikipediako35
wikipediatr33
wikipediath32
wikipediaro24
wikipediafi24
wikipediael13
wikipediacs11
wikipediaar10
wikipediahu9
wikipediauk8
wikipediahe8
wikipediask7
wikipediahr7
wikipediams6
wikipediais4
wikipediavi2
wikipedialt2
wikipediaca2

Hmmm, nobody but wikipedia? I'm really wondering if we can drop the other backends from here.

Change 390347 merged by jenkins-bot:
[operations/mediawiki-config@master] search.wikimedia.org: simplify limit handling

https://gerrit.wikimedia.org/r/390347

debt claimed this task.
dcausse moved this task from Needs Reporting to Incoming on the Discovery-Search (Current work) board.
dcausse added a subscriber: dcausse.

Reoponing as it's sill occurring, this time is because we return 500 for whatever non 200 code we receive from the backend.
See https://github.com/wikimedia/operations-mediawiki-config/blob/master/docroot/search.wikimedia.org/index.php#L62-L64
This code should be smarter and i.e. not return 500 when the backend returns a 400.

debt removed debt as the assignee of this task.May 1 2018, 5:31 PM

Removing myself as I unfortunately won't be able to help. Whomever is free next will take it.

Gehel added subscribers: fgiunchedi, Gehel.

Change 430502 had a related patch set uploaded (by EBernhardson; owner: EBernhardson):
[operations/mediawiki-config@master] Forward response codes >= 400 on search.wikimedia.org

https://gerrit.wikimedia.org/r/430502

Change 430502 merged by jenkins-bot:
[operations/mediawiki-config@master] Forward response codes >= 400 on search.wikimedia.org

https://gerrit.wikimedia.org/r/430502