Page MenuHomePhabricator

Intermittent json parse failures in comp suggest
Closed, ResolvedPublicPRODUCTION ERROR

Description

Looks like comp suggest is having some sort of json encoding problem? Perhaps php is structuring something in an unexpected way. These are not limited to a particular language, i see them for ja.wikipedia.org and de.wikipedia.org. These aren't all directly reproducable though, i tried pulling the url's from logstash and running them and they go through fine. Some sort of intermittent error.

https://logstash.wikimedia.org/app/kibana#/doc/logstash-*/logstash-2018.10.29/mediawiki/?id=AWbAn3P0Of1_EDXpLeqj

Search backend error during comp_suggest search for '削除された 悪ふざけ' after 3: parsing_exception: Unknown key 
for a START_ARRAY in [suggest].

Event Timeline

Solution at this point will be to add additional logging to the CirrusSearch request pipeline. Any request that results in a parsing error should have it's request body logged so we can track down what the invalid thing was.

EBjune triaged this task as Medium priority.Nov 1 2018, 5:04 PM
EBjune moved this task from needs triage to Up Next on the Discovery-Search board.
Krinkle added a subscriber: Krinkle.

(Initial triage shows matches since at least wmf.1, feel free to move elsewhere as appropiate.)

Mentioned in SAL (#wikimedia-operations) [2018-11-28T19:30:58Z] <ebernhardson> start goreplay logging of port 9200 across eqiad elastic cluster to track down T208248

Rather than track this at the mediawiki level, i recorded all the http traffic coming into the eqiad search cluster, waited for some error messages to come in (~ 30 minutes) and then dug parse exceptions out of the req/response pairs.

One example:

POST /_msearch HTTP/1.1
Connection: close
Host: search.svc.eqiad.wmnet
X-Client-IP: 10.64.32.52
X-Forwarded-For: 10.64.32.52
X-Forwarded-Proto: https
X-Connection-Properties: H2=0; SSR=0; SSL=TLSv1.2; C=ECDHE-ECDSA-AES256-GCM-SHA384; EC=prime256v1;
Content-Length: 111
Accept: */*
Accept-Encoding: deflate, gzip
Content-Type: application/x-ndjson

{"index":["enwiki_titlesuggest"]}
{"query":{"match_none":{}},"size":0,"suggest":[],"_source":["target_title"]}
{"error":{"root_cause":[{"type":"parsing_exception","reason":"Unknown key for a START_ARRAY in [suggest].","line":1,"col":47}],"type":"parsing_exception","reason":"Unknown key for a START_ARRAY in [suggest].","line":1,"col":47},"status":400}

Essentially the empty suggest array is being encoded as [] when it should either be {} or not present at all. I'm not sure why this array would be empty though, it should contain at least one completion suggester profile (or more likely, many).

@dcausse you might have an idea of how completion suggester could have no requests?

Change 476526 had a related patch set uploaded (by DCausse; owner: DCausse):
[mediawiki/extensions/CirrusSearch@master] Properly test the "classic" profile name

https://gerrit.wikimedia.org/r/476526

Thanks for the investigations!
See the attached patch for the cause.

to clarify the bug was affecting all users having set "classic" in their search preferences.

Change 476526 merged by jenkins-bot:
[mediawiki/extensions/CirrusSearch@master] Properly test the "classic" profile name

https://gerrit.wikimedia.org/r/476526

mmodell changed the subtype of this task from "Task" to "Production Error".Aug 28 2019, 11:08 PM