Should return error message (rather than "zero results") when search fails
Closed, ResolvedPublic

Description

Per bug 42423 comment #17, it would be much better if we received an error when the search index is broken rather than receiving "0 results", as what currently happens. This assumes that Lucene actually returns a useful error message that gets tossed, rather than providing MediaWiki with 0 results in this case.


Version: unspecified
Severity: normal
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=43869
https://bugzilla.wikimedia.org/show_bug.cgi?id=42423

bzimport set Reference to bz43544.
RobLa-WMF created this task.Via LegacyDec 31 2012, 9:38 PM
Aklapper added a comment.Via ConduitJan 1 2013, 11:51 AM
  • Bug 43553 has been marked as a duplicate of this bug. ***
tstarling added a comment.Via ConduitJan 7 2013, 1:35 AM

For RMI errors, of the kind which were the cause of "zero results" being returned last time I debugged a Lucene problem, there are error handling issues in multiple parts of the stack:

  • The SearchEngine interface in the core has no way to report errors to SpecialSearch (short of throwing an exception).
  • MWSearch makes no attempt to extract an error message from the body of HTTP 500 errors, it just returns null.
  • In lucene-search-2, some RMIMessengerClient methods respond to network errors by returning an empty result set, instead of returning an error status as they should.
tstarling added a comment.Via ConduitJan 10 2013, 1:38 AM

We should at least log errors to UDP.

bzimport added a comment.Via ConduitJan 25 2013, 12:59 AM

ram wrote:

Ruby script to reproduce failure

Sends the same search query 100 times with a delay 10s between iterations. I ran
it a few times and was able to reproduce the failure every time sometimes very
quickly, sometimes after around 70 iterations.

Attached: mwsearch.rb

Nemo_bis added a comment.Via ConduitFeb 4 2013, 1:22 PM

I disagree on this being an enhancement: it's an actual bug because it's highly misleading.

bzimport added a comment.Via ConduitFeb 4 2013, 3:28 PM

ForoaW wrote:

Sorry, but this is as critical as bug 42423. A system that fails to report failed searches (but reports nothing found) is plain misleading and makes that users loose confidence in the system, especially at the rate it fails, sometimes 100 % during minutes. At least, it should state try again.

bzimport added a comment.Via ConduitMar 25 2013, 5:38 PM

ram wrote:

Over the weekend my script was again able to reproduce the "zero results"
error so the issue is still with us; some log analysis indicates that failure of the 'highlight' call due to socket timeouts may be the problem so returning a
failure in this case is easy to do; the broader issue of why we are getting
socket timeouts is more difficult.

bzimport added a comment.Via ConduitMar 26 2013, 2:30 AM

ram wrote:

https://gerrit.wikimedia.org/r/#/c/55841/
Instruments code to dump entire GlobalConfiguration singleton to a file to aid
debugging.

bzimport added a comment.Via ConduitMar 28 2013, 1:39 AM

ram wrote:

https://gerrit.wikimedia.org/r/#/c/56354/
Adds better error handling to return an error status instead of hiding internal errors and falsely reporting "zero results". Similar changes to the PHP side of things mentioned by TimS in comment 3 above are still being worked on.

bzimport added a comment.Via ConduitApr 3 2013, 8:38 PM

ram wrote:

https://gerrit.wikimedia.org/r/#/c/57350/
https://gerrit.wikimedia.org/r/#/c/57368/

I've pushed fixes to the PHP side in the above 2 commits. Apparently Chad has
also been working on this and his slightly different fixes are here:

https://gerrit.wikimedia.org/r/#/c/57337/
https://gerrit.wikimedia.org/r/#/c/57336/

demon added a comment.Via ConduitApr 4 2013, 8:15 PM

I amended mine based off our discussion on IRC/e-mail, and combines both approaches.

gerritbot added a comment.Via ConduitApr 8 2013, 9:18 PM

https://gerrit.wikimedia.org/r/57350 (Gerrit Change Idb42d64987164ba099228b154729c9c86af7407f) | change ABANDONED [by Ram]

gerritbot added a comment.Via ConduitApr 8 2013, 9:19 PM

https://gerrit.wikimedia.org/r/57368 (Gerrit Change Ic07ce8f32be8358fbb2f5a60f3c8c324cb27694c) | change ABANDONED [by Ram]

Aklapper added a comment.Via ConduitApr 18 2013, 9:42 AM

Chad's patches in https://gerrit.wikimedia.org/r/#/c/57336/ and https://gerrit.wikimedia.org/r/#/c/57337/ got merged, but that broke ApiQuerySearch (see bug 47353).

gerritbot added a comment.Via ConduitApr 22 2013, 1:23 AM

https://gerrit.wikimedia.org/r/56354 (Gerrit Change Ibeef63f45a3276e870afbcadbd08c7bd2967b9e6) | change APPROVED and MERGED [by Tim Starling]

Aklapper added a comment.Via ConduitApr 22 2013, 3:27 PM

All three patches (that I'm aware of) got merged, can this be closed as FIXED or is more needed?

bzimport added a comment.Via ConduitApr 22 2013, 9:53 PM

ram wrote:

Let's wait for it to be deployed (in a few days, hopefully) before closing.

gerritbot added a comment.Via ConduitApr 29 2013, 1:02 AM

https://gerrit.wikimedia.org/r/55841 (Gerrit Change I178fba54a42173bce0b941f143bbc5ecf2bac15d) | change ABANDONED [by Tim Starling]

bzimport added a comment.Via ConduitApr 29 2013, 6:39 AM

ForoaW wrote:

Search failure has been seen a couple of times at Commons last days.

Nemo_bis added a comment.Via ConduitApr 29 2013, 7:12 AM

(In reply to comment #20)

Search failure has been seen a couple of times at Commons last days.

Yes, I saw it too yesterday. Sometimes it's nice to see errors. ;)

bzimport added a comment.Via ConduitApr 30 2013, 12:16 AM

ram wrote:

Closing this since we are now seeing proper errors instead of spurious "zero results".

Nemo_bis added a comment.Via ConduitSep 30 2013, 6:55 AM

(In reply to comment #4)

We should at least log errors to UDP.

Tim, should this be filed as separate bug report? The log was (re)enabled and then disabled as too spammy on April 24: a1c62a08.
Currently we have:

MWSearch_body.php
500: wfDebugLog( 'mwsearch', "Search timeout requesting $searchUrl" );
508: wfDebugLog( 'mwsearch', 'Search backend error: ' . $m[1] );

Maybe the second log could be removed/renamed so that we can at least have some way to count the first.

Nemo_bis added a comment.Via ConduitOct 2 2013, 6:45 AM

(In reply to comment #23)

(In reply to comment #4)
> We should at least log errors to UDP.

Tim, should this be filed as separate bug report? The log was (re)enabled and
then disabled as too spammy on April 24: a1c62a08.
Currently we have:

MWSearch_body.php
500: wfDebugLog(
'mwsearch',
"Search timeout requesting $searchUrl" );
508: wfDebugLog( 'mwsearch',
'Search
backend error: ' . $m[1] );

Maybe the second log could be removed/renamed so that we can at least have
some
way to count the first.

That was filed as bug 54865.

Add Comment