Page MenuHomePhabricator

wikidata api - wbsearchentities randomly not returning search results
Closed, ResolvedPublic5 Estimated Story Points

Description

description
this is related to https://www.wikidata.org/w/api.php?action=help&modules=wbsearchentities - wbsearchentities randomly not returning search results for XV wiek keyword

steps to reproduce
do this request multiple times

curl -i https://www.wikidata.org/w/api.php?action=wbsearchentities&format=json&language=pl&search=XV%20wiek

expected result
this is always returned

{"searchinfo":{"search":"XV wiek"},"search":[{"id":"Q7018","title":"Q7018","pageid":8152,"repository":"wikidata","url":"//www.wikidata.org/wiki/Q7018","concepturi":"http://www.wikidata.org/entity/Q7018","label":"15th century","description":"century","match":{"type":"label","language":"pl","text":"XV wiek"},"aliases":["XV wiek"]},{"id":"Q178696","title":"Q178696","pageid":178087,"repository":"wikidata","url":"//www.wikidata.org/wiki/Q178696","concepturi":"http://www.wikidata.org/entity/Q178696","label":"15th century BC","description":"century","match":{"type":"label","language":"pl","text":"XV wiek p.n.e."},"aliases":["XV wiek p.n.e."]},{"id":"Q2711964","title":"Q2711964","pageid":2607583,"repository":"wikidata","url":"//www.wikidata.org/wiki/Q2711964","concepturi":"http://www.wikidata.org/entity/Q2711964","label":"15th century in literature","description":"literature-related events during the 15th century","match":{"type":"label","language":"pl","text":"XV wiek - literatura"},"aliases":["XV wiek - literatura"]}],"success":1}

actual result
randomly this is returned

{"searchinfo":{"search":"XV wiek"},"search":[],"success":1}

additional info
example headers - probably useless info - however I've noticed that "no results response" do not have x-search-id header

http headers of "no results response"

HTTP/2 200 
date: Mon, 07 Jun 2021 17:08:55 GMT
server: mw1388.eqiad.wmnet
x-content-type-options: nosniff
p3p: CP="See https://www.wikidata.org/wiki/Special:CentralAutoLogin/P3P for more info."
x-frame-options: DENY
content-disposition: inline; filename=api-result.json
vary: Accept-Encoding,Treat-as-Untrusted,X-Forwarded-Proto,Cookie,Authorization
cache-control: private, must-revalidate, max-age=0
content-type: application/json; charset=utf-8
age: 0
x-cache: cp3060 miss, cp3058 pass
x-cache-status: pass
server-timing: cache;desc="pass", host;desc="cp3058"
strict-transport-security: max-age=106384710; includeSubDomains; preload
report-to: { "group": "wm_nel", "max_age": 86400, "endpoints": [{ "url": "https://intake-logging.wikimedia.org/v1/events?stream=w3c.reportingapi.network_error&schema_uri=/w3c/reportingapi/network_error/1.0.0" }] }
nel: { "report_to": "wm_nel", "max_age": 86400, "failure_fraction": 0.05, "success_fraction": 0.0}
permissions-policy: interest-cohort=()
set-cookie: WMF-Last-Access=07-Jun-2021;Path=/;HttpOnly;secure;Expires=Fri, 09 Jul 2021 12:00:00 GMT
set-cookie: WMF-Last-Access-Global=07-Jun-2021;Path=/;Domain=.wikidata.org;HttpOnly;secure;Expires=Fri, 09 Jul 2021 12:00:00 GMT
x-client-ip: <REDACTED>
set-cookie: GeoIP=PL:30:Poznan:52.40:16.90:v4; Path=/; secure; Domain=.wikidata.org
accept-ranges: bytes
content-length: 59

http headers of "results response"

HTTP/2 200 
date: Mon, 07 Jun 2021 17:10:07 GMT
server: mw1287.eqiad.wmnet
x-content-type-options: nosniff
p3p: CP="See https://www.wikidata.org/wiki/Special:CentralAutoLogin/P3P for more info."
x-search-id: e1ixdzulsy1hvlx5g7jg0zfep
x-frame-options: DENY
content-disposition: inline; filename=api-result.json
vary: Accept-Encoding,Treat-as-Untrusted,X-Forwarded-Proto,Cookie,Authorization
cache-control: private, must-revalidate, max-age=0
content-type: application/json; charset=utf-8
age: 2
x-cache: cp3064 miss, cp3058 pass
x-cache-status: pass
server-timing: cache;desc="pass", host;desc="cp3058"
strict-transport-security: max-age=106384710; includeSubDomains; preload
report-to: { "group": "wm_nel", "max_age": 86400, "endpoints": [{ "url": "https://intake-logging.wikimedia.org/v1/events?stream=w3c.reportingapi.network_error&schema_uri=/w3c/reportingapi/network_error/1.0.0" }] }
nel: { "report_to": "wm_nel", "max_age": 86400, "failure_fraction": 0.05, "success_fraction": 0.0}
permissions-policy: interest-cohort=()
set-cookie: WMF-Last-Access=07-Jun-2021;Path=/;HttpOnly;secure;Expires=Fri, 09 Jul 2021 12:00:00 GMT
set-cookie: WMF-Last-Access-Global=07-Jun-2021;Path=/;Domain=.wikidata.org;HttpOnly;secure;Expires=Fri, 09 Jul 2021 12:00:00 GMT
x-client-ip: <REDACTED>
set-cookie: GeoIP=PL:30:Poznan:52.40:16.90:v4; Path=/; secure; Domain=.wikidata.org
accept-ranges: bytes
content-length: 1040

Event Timeline

I think it used to happen occasionally when I thought I saturated mwapi through query service .. except that today it seems to happen constantly.

@Addshore is this on our side or search platform?

If I had to guess I would say this would be something relating to the search team

MPhamWMF set the point value for this task to 5.Jun 9 2021, 3:44 PM

X-Search-Id is attached to responses whenever CirrusSearch has made a request to elasticsearch. That suggests that somehow these requests are getting a response without ever hitting the backend. For the moment i've left a script looping on mwmaint2002 fetching this endpoint to see if it will rperoduce, it's done about 10k queries so far without reproduction but can probably let it run for awhile.

Seems i forgot this was running, but it ran ~8M requests over 120 hours but never got a response that was missing X-Search-Id or the result list. Something triggers this, but not sure what it could be.

Looking at the codebase I don't understand where this could happen without entering CirrusSearch (unless the APIAfterExecute hook is not called on wbsearchentities or a bug in the cirrus request logger).
Assuming that cirrus was hit I think the problem comes from: https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/WikibaseCirrusSearch/+/refs/heads/master/src/EntitySearchElastic.php#319, where in case of an error with the backend we silently ignore the error and return an empty array. While I'm not sure that this problem cannot be triggered by something else it seems that what EntitySearchElastic does can cause it on backend failures (elasticsearch failures, poolcounter rejections...). See T260276.
Fixing this would require changing (or adding a new interface) the contract that EntitySearchHelper offers. It does not give the option to let the caller know the status of the response assuming that the backend never fails.

I see two options to improve the situation:

1/ to fail with an exception: this would let API clients knows that something is not correct and varnish would not cache the output. Unfortunately it is hard to know how other internal clients would behave to this new exception.
2/ Change the EntitySearchHelper contract and allow to pass a Status object instead of the raw TermSearchResult[] array. This is possibly a long refactoring that I'm not sure is really worth the effort esp. if the failures are rare.

Change 730408 had a related patch set uploaded (by DCausse; author: DCausse):

[mediawiki/extensions/Wikibase@master] [WIP] Add EntitySearchException

https://gerrit.wikimedia.org/r/730408

Change 730409 had a related patch set uploaded (by DCausse; author: DCausse):

[mediawiki/extensions/WikibaseCirrusSearch@master] Throw exception on backend failure

https://gerrit.wikimedia.org/r/730409

Change 730410 had a related patch set uploaded (by DCausse; author: DCausse):

[mediawiki/extensions/WikibaseLexemeCirrusSearch@master] Throw exception on backend failure

https://gerrit.wikimedia.org/r/730410

Change 732774 had a related patch set uploaded (by DCausse; author: DCausse):

[mediawiki/extensions/PropertySuggester@master] Fail on search backend errors when using wbsgetsuggestions

https://gerrit.wikimedia.org/r/732774

Change 730408 merged by jenkins-bot:

[mediawiki/extensions/Wikibase@master] Add EntitySearchException

https://gerrit.wikimedia.org/r/730408

Change 732774 merged by jenkins-bot:

[mediawiki/extensions/PropertySuggester@master] Fail on search backend errors when using wbsgetsuggestions

https://gerrit.wikimedia.org/r/732774

Change 730410 merged by jenkins-bot:

[mediawiki/extensions/WikibaseLexemeCirrusSearch@master] Throw exception on backend failure

https://gerrit.wikimedia.org/r/730410

Change 730409 merged by jenkins-bot:

[mediawiki/extensions/WikibaseCirrusSearch@master] Throw exception on backend failure

https://gerrit.wikimedia.org/r/730409

Gehel claimed this task.