Page MenuHomePhabricator

Fix: "Warning: Search backend error during .. took .." (tracking)
Open, LowestPublicPRODUCTION ERROR

Description

https://logstash.wikimedia.org/#/dashboard/elasticsearch/mediawiki-error – search for "Search backend error during". As of July 2018, about 11,000 recorded instances of this error per month. For the most part these errors mean a user submitted an invalid search string.

Normalised:

Search backend error during {queryType} search for '{query}' after {tookMs}: {error_message}

Sample (1.32.0.wmf-14)
Search backend error during comp_suggest search for 'audit' after 1: : 

elasticTookMs: "0"
error_message: ":"
hitsOffset: "0"
hitsReturned: "0"
hitsTotal: "-"

level:WARNING
channel:CirrusSearch
url: /w/api.php?action=opensearch&search=audit&limit=15
reqId: W2CfuApAEMIAADlGDK8AAAAK
Sample (1.25wmf23)
Warning: Search backend error during degraded_full_text search
 for '"nicht antretbar*" OR "nicht antreten*" OR "nicht erstattbar*" OR "nicht erstatten*" OR "nicht erstattet*" OR "nicht stornierbar*" OR "nicht stornieren*" OR "nicht storniert*"'
 after 62. 
  Parse error on ' or   OR   OR   OR   OR   OR   or ': Encountered " <OR> "OR "" at line 1, column 11.
[Called from CirrusSearch\ElasticsearchIntermediary::failure in /srv/mediawiki/php-1.25wmf23/extensions/CirrusSearch/includes/ElasticsearchIntermediary.php at line 98] in /srv/mediawiki/php-1.25wmf23/includes/debug/MWDebug.php on line 300

Related Objects

StatusSubtypeAssignedTask
OpenPRODUCTION ERRORNone
DuplicatePRODUCTION ERRORNone
ResolvedPRODUCTION ERROR Manybubbles
ResolvedPRODUCTION ERROREBernhardson
ResolvedEBernhardson
DuplicatePRODUCTION ERRORNone
DuplicateNone
DeclinedNone
OpenNone
Resolveddcausse
Resolveddcausse
Resolveddcausse
Resolveddcausse
Resolveddcausse
Resolveddcausse
Resolveddcausse
Resolveddcausse
Resolveddcausse
OpenNone
Resolveddcausse
ResolvedCommunityTechBot
Resolveddcausse
Resolveddcausse
Resolveddcausse
Duplicatedcausse
Resolveddcausse
Resolveddcausse
ResolvedJohan
OpenNone

Event Timeline

Krinkle raised the priority of this task from to Needs Triage.
Krinkle updated the task description. (Show Details)
Krinkle subscribed.

3 tasks here:

  1. So the spacing one we're pretty aware of (and have tried to fix already). I could've sworn I had a task for this but I guess not.
  2. I was just about to file an issue for the OR parse error. That'll be its own task.
  3. The highlighting one is bad and its own thing as well.

Raising priority too because these are really spammy (and not easily grouped because of the different queries so they don't show up as one item in fatalmonitor)

This is caused by @Jdouglas's change to pull out phrase prefixes. This query was never going to work properly anyway - but now it blows up!

I can make it stop blowing up but I can't get it to actually do what the user wants without rewriting the query parser. Which we want to do any way, but we just can't do yet.

Seems the last task to be completed is T95020 :-}

I'm pretty sure we got this, yeah. Is it still coming up?

I'm also slowly slowly working through a replacement for query_string which should fix this better.

I have no idea whether it still occurs. I am not sure how to trigger the error nor how to search for it in logstash.wikimedia.org :-/

The task detail had the search instruction for logstash: https://logstash.wikimedia.org/#/dashboard/elasticsearch/hhvm search for "ElasticsearchIntermediary"

This was the tracking task for the 3 separate subtasks. Please leave this one open until all 3 are closed. The one with "OR" isn't done.

hashar renamed this task from Fix: "Warning: Search backend error during .. took .." to Fix: "Warning: Search backend error during .. took .." (tracking).Jun 30 2015, 9:38 PM
hashar added a project: Tracking-Neverending.
greg subscribed.

Unassigning from Nik, assuming that is correct :)

@EBernhardson does not think we get too many of these in the logs these days, so I'm lowering priority based on that.

Interesting, I wonder why less instances of it...maybe people have given up?

Deskana lowered the priority of this task from High to Low.Nov 24 2015, 6:19 PM

Lowering priority to reflect the reality of the team's prioritisation.

actually they still exist, they are just shunted off into the CirrusSearch logs, instead of spamming up hhvm.log. Still something to fix, but it wont be annoying other teams looking at general logs by spamming those up.

debt lowered the priority of this task from Low to Lowest.Mar 9 2017, 11:16 PM
debt moved this task from needs triage to search-icebox on the Discovery-Search board.
debt subscribed.

Issue still exists in our query parsing...but not a huge deal. Moving to be dealt with later.

As pointed out by Manybubbles this will be completely fixed once we fully get rid of elasticsearch query_string. This work is being tracked in T185108.

mmodell changed the subtype of this task from "Task" to "Production Error".Aug 28 2019, 11:12 PM

Untagging as prod error since this is not a fatal error, runtime PHP warning, HTTP timeout, or HTTP server error.

It no longer appears on the errors dashboard, but can still be found on the "mediawiki" dashboard and/or on search-specific dashboards. It continues to be important and has impact on people searching, but seems well-contained to not be an operational risk and/or has fallbacks in place.

MPhamWMF subscribed.

Closing out low/est priority tasks over 6 months old with no activity within last 6 months in order to clean out the backlog of tickets we will not be addressing in the near term. Please feel free to reopen if you think a ticket is important, but bare in mind that given current priorities and resourcing, it is unlikely for the Search team to pick up these tasks for the indefinite future. We hope that the requested changes have either been addressed by or made irrelevant by work the team has done or is doing -- e.g. upgrading Elasticsearch to a newer version will solve various ES-related problems -- or will be subsumed by future work in a more generalized way.

RhinosF1 removed a project: Discovery-Search.
RhinosF1 subscribed.

Re-opening tasks and removing from team workboard per IRC feedback given yesterday and discussion with MPham.