Page MenuHomePhabricator

CirrusSearch: Do something about queries that run scripts across a bajillion documents
Closed, ResolvedPublic

Description

Do something about queries that run scripts across a bajillion documents. They cause load spikes and take forever. They, unforunately, include prefix searches for very few letters. One idea is to move all the script scoring into the rescore.


Version: unspecified
Severity: normal
Whiteboard: Elasticsearch_1.1
See Also:
https://github.com/elasticsearch/elasticsearch/issues/4748

Details

Reference
bz60151

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 3:00 AM
bzimport added a project: CirrusSearch.
bzimport set Reference to bz60151.

I don't think we can push all script scoring into the rescore until we get multiple rescores from Elasticsearch. BoostingQuery doesn't work for combining the phrase match because it only multiplies the scores of documents that match the negative query rather then adds them. Sad.

*** Bug 57113 has been marked as a duplicate of this bug. ***

https://gerrit.wikimedia.org/r/#/c/112695/ is going to help a ton by speeding up those queries that run script across a ton of documents. We've also made a bunch of changes to run the rescore across a ton of documents less frequently. This should help. I still want Elasticsearch 1.0's multiple rescores, but this might just make it less important.

It looks like this wasn't merged to 1.0 as marked in the upstream bug but instead just 1.1+. So we wait....

Going to implement this now.

Change 124994 had a related patch set uploaded by Manybubbles:
WIP: Use multiple rescores to apply script scoring

https://gerrit.wikimedia.org/r/124994

Change 124994 merged by jenkins-bot:
Use multiple rescores to apply script scoring

https://gerrit.wikimedia.org/r/124994