When the insource query is not sufficiently optimized SourceRegexQuery will simply stop collecting results when max_inspect is reached.
This is confusing for the user as the result pages will contain partial results without any warnings.
We should use the builtin timeout feature from elastic and display a warning to the user when we failed to scan the index entirely. The user could then try to optimize its regex.
Description
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | debt | T131828 results of regEx cirrus search not stable | |||
Resolved | dcausse | T106685 Regexes in search queries can sometimes return fewer search results than they should | |||
Resolved | EBernhardson | T134157 Use builtin timeout support instead of max_inspect for insource queries | |||
Resolved | EBernhardson | T149142 Create messages for users when their search using advanced syntax doesn't return stuff | |||
Resolved | • Deskana | T103289 CirrusSearch: More search results when narrowing down search term |
Event Timeline
Change 320517 had a related patch set uploaded (by EBernhardson):
Accept 'OK' status results from search engine
Change 320520 had a related patch set uploaded (by EBernhardson):
Accept 'OK' status results from search engine
Change 320551 had a related patch set uploaded (by EBernhardson):
Accept 'OK' status results from search engine
While working out the code parts of this, I also started to wonder about the timeouts. In production we currently allow for a 40s timeout on regex searches. This is tampered by the fact that we limit the number of documents checked to 10k.
The breakdown for nov 1-8th looks like:
gt_35s gt_30s gt_25s gt_20s gt_15s gt_10s gt_5s all 3 6 6 10 24 77 531 25616
The volume overall is pretty low. Will need to look at both the grafana dashboards, and these numbers after rollout to get an idea of the impact. I would expect there are a variety of searches that are hitting the 10k document limit long before the timeout. Depending on what it ends up looking like we might want to adjust down the 40s timeout, or maybe we leave it as is ... not sure yet.
Change 320554 had a related patch set uploaded (by EBernhardson):
Only return Status objects from CirrusSearch::searchText
Change 320700 had a related patch set uploaded (by EBernhardson):
Use timeouts to limit regex searches rather than max_inspect
It's unclear whether these patches are complete or not. @dcausse will take a look at these patches, and see. He may also try to implement Brad Jorsch's idea.
The patches should be complete, I've now updated it for anomie's feedback. The master patch needs to be merged first, then the Cirrus patch any time after.
Change 320554 merged by jenkins-bot:
Only return Status objects from CirrusSearch::searchText
Change 320700 merged by jenkins-bot:
Use timeouts to limit regex searches rather than max_inspect