Page MenuHomePhabricator

[bug] cirrussearch fatal in 1.28.0-wmf.23 -
Closed, ResolvedPublic

Description

link to reproduce:

https://commons.wikimedia.org/w/index.php?title=Special:Search&limit=500&offset=31501&profile=default&search=google+art+project++

logline:

{"id":"WBEoLwpAIDgAAFMENwUAAAAY","type":"BadMethodCallException","file":"/srv/mediawiki/php-1.28.0-wmf.23/includes/specials/SpecialSearch.php","line":577,"message":"Call to a member function searchContainedSyntax() on a non-object (boolean)","code":0,"url

Event Timeline

Having a hard time seeing how this is related to changes on wmf.23 as the relevant code hasn't changed recently.

Changes that touched SpecialSearch.php: history | blame

It appears that a boolean is returned at line 297 by the SearchEngine implementation class. I didn't dig any further as I'm told that @Smalyshev is looking into it.

Looks like it happens also on mediawiki.org but not on en.wiki or ru.wiki. Also does not happen with smaller offsets. I suspect huge offset triggers some bug when page with this title does not exist.

I think we only support offsets up to 10000, so not sure how offset of 31501 came into being, but anyway shouldn't break on that.

In addition to the above, the exception only happens when offset is over 10000.

The bug is in Searcher::searchOne:

		if ( $result->isOK() ) {
			// Convert array of responses to single value
			$value = $result->getValue();
			$result->setResult( true, reset( $value ) );
		}

If $value is EmptyResultSet (which happens when limit is negative) then it tries to apply reset to EmptyResultSet, which in HHVM produces notice and false. Interestingly enough, PHP produces null in that case. So testing it on PHP would not produce the failure.

Smalyshev added a project: CirrusSearch.
Smalyshev added subscribers: EBernhardson, dcausse.

This is not a new bug, but looks like it now happens on all wikis.
It was only visible on it wiki projects where interwiki is enabled: https://it.wikivoyage.org/w/index.php?title=Speciale:Ricerca&limit=20&offset=30000&ns0=1&ns100=1&ns104=1&search=test

Change 318263 had a related patch set uploaded (by DCausse):
Fix errors when searching out of the allowed limits

https://gerrit.wikimedia.org/r/318263

mmodell lowered the priority of this task from High to Medium.Oct 27 2016, 3:10 PM

Seems somewhat obscure, should it still be blocking the train?

Seems somewhat obscure, should it still be blocking the train?

@Smalyshev notes above that this bug can only occur if the offsets is larger than the maximum number of search results that can be returned. Users are therefore incredibly unlikely to ever discover this bug. More, if they do discover it, by definition we'd not have given them any results anyway since they're over the practical maximum offset.

Given the above, I don't think fixing this blocks anything.

Agree, I put priority as high because I suspected it may not be related to offset but now I'm pretty sure it is, and it's pretty unlikely real user will hit it.

Change 318505 had a related patch set uploaded (by Hashar):
Fix errors when searching out of the allowed limits

https://gerrit.wikimedia.org/r/318505

Change 318505 merged by jenkins-bot:
Fix errors when searching out of the allowed limits

https://gerrit.wikimedia.org/r/318505

Mentioned in SAL (#wikimedia-operations) [2016-10-28T09:20:31Z] <hashar> Pulling CirrusSearch patch https://gerrit.wikimedia.org/r/#/c/318505/ on mw1099 for T149254 (fix log spam/fatal/warnings)

Mentioned in SAL (#wikimedia-operations) [2016-10-28T09:24:08Z] <hashar@tin> Synchronized /srv/mediawiki-staging/php-1.28.0-wmf.23/extensions/CirrusSearch: https://gerrit.wikimedia.org/r/#/c/318505/ for T149254 (fix log spam/fatal/warnings) (duration: 00m 56s)

There is a warninig error triggered as well (dupe T149403) that ends up triggering an alarms because the level of warnings is above some threshold:

https://grafana.wikimedia.org/dashboard/db/production-logging?panelId=19&fullscreen

mw-errors-fatals-rates.png (342×702 px, 27 KB)

Hence we agreed to deploy the pending patch for CirrusSearch https://gerrit.wikimedia.org/r/#/c/318263 . Confirmed on beta that it got rid of both the fatal and the warning. Pushed to production using https://gerrit.wikimedia.org/r/#/c/318505/ which clears out the issue on mw1099 and is now deployed on the whole fleet.

[9:29:30 UTC] <icinga-wm> RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0]

Deskana assigned this task to dcausse.

Great! Thanks. :-)

Change 318263 merged by jenkins-bot:
Fix errors when searching out of the allowed limits

https://gerrit.wikimedia.org/r/318263