Page MenuHomePhabricator

Use builtin timeout support instead of max_inspect for insource queries
Closed, ResolvedPublic

Description

When the insource query is not sufficiently optimized SourceRegexQuery will simply stop collecting results when max_inspect is reached.
This is confusing for the user as the result pages will contain partial results without any warnings.
We should use the builtin timeout feature from elastic and display a warning to the user when we failed to scan the index entirely. The user could then try to optimize its regex.

Event Timeline

dcausse created this task.May 2 2016, 9:14 AM
Restricted Application added a project: Discovery. · View Herald TranscriptMay 2 2016, 9:14 AM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Deskana triaged this task as Low priority.May 5 2016, 10:18 PM
Deskana added a subscriber: Deskana.

This doesn't block the upgrade, so it can't be prioritised right now.

debt added subscribers: eranroz, debt.

Moving this into the sprint to start testing on it.

Restricted Application added a subscriber: TerraCodes. · View Herald TranscriptSep 12 2016, 3:38 PM

Change 320517 had a related patch set uploaded (by EBernhardson):
Accept 'OK' status results from search engine

https://gerrit.wikimedia.org/r/320517

Change 320520 had a related patch set uploaded (by EBernhardson):
Accept 'OK' status results from search engine

https://gerrit.wikimedia.org/r/320520

Change 320551 had a related patch set uploaded (by EBernhardson):
Accept 'OK' status results from search engine

https://gerrit.wikimedia.org/r/320551

While working out the code parts of this, I also started to wonder about the timeouts. In production we currently allow for a 40s timeout on regex searches. This is tampered by the fact that we limit the number of documents checked to 10k.

The breakdown for nov 1-8th looks like:

gt_35s  gt_30s  gt_25s  gt_20s  gt_15s  gt_10s  gt_5s   all
3       6       6       10      24      77      531     25616

The volume overall is pretty low. Will need to look at both the grafana dashboards, and these numbers after rollout to get an idea of the impact. I would expect there are a variety of searches that are hitting the 10k document limit long before the timeout. Depending on what it ends up looking like we might want to adjust down the 40s timeout, or maybe we leave it as is ... not sure yet.

Change 320554 had a related patch set uploaded (by EBernhardson):
Only return Status objects from CirrusSearch::searchText

https://gerrit.wikimedia.org/r/320554

Change 320520 merged by jenkins-bot:
Accept 'OK' status results from search engine

https://gerrit.wikimedia.org/r/320520

Change 320517 merged by jenkins-bot:
Accept 'OK' status results from search engine

https://gerrit.wikimedia.org/r/320517

Change 320700 had a related patch set uploaded (by EBernhardson):
Use timeouts to limit regex searches rather than max_inspect

https://gerrit.wikimedia.org/r/320700

It's unclear whether these patches are complete or not. @dcausse will take a look at these patches, and see. He may also try to implement Brad Jorsch's idea.

The patches should be complete, I've now updated it for anomie's feedback. The master patch needs to be merged first, then the Cirrus patch any time after.

Change 320551 merged by jenkins-bot:
Accept 'OK' status results from search engine

https://gerrit.wikimedia.org/r/320551

Change 320554 merged by jenkins-bot:
Only return Status objects from CirrusSearch::searchText

https://gerrit.wikimedia.org/r/320554

eranroz removed a subscriber: eranroz.Dec 1 2016, 6:21 PM

Change 320700 merged by jenkins-bot:
Use timeouts to limit regex searches rather than max_inspect

https://gerrit.wikimedia.org/r/320700

Deskana closed this task as Resolved.Feb 3 2017, 4:56 PM