Page MenuHomePhabricator

WMFTimeoutException in Special:Search
Closed, ResolvedPublic

Description

When I search insource:/\|Link\=/ on zhwiki.
Actual Results: [XHCVoQpAADkAAEG2miQAAABI] 2019-02-23 00:37:50: Fatal exception of type "WMFTimeoutException"
Expected Results: [[MediaWiki:Cirrussearch-regex-timed-out]] (The regex search timed out...) or other messages.
Note: Not always get this error. Sometimes it works.

Details

Related Gerrit Patches:

Event Timeline

Restricted Application added a project: Discovery-Search. · View Herald TranscriptFeb 23 2019, 12:46 AM
Restricted Application added subscribers: Cosine02, Aklapper. · View Herald Transcript
EBernhardson added a subscriber: EBernhardson.EditedFeb 23 2019, 1:44 AM

Will have to wait on looking into the timeout, we are rolling a search cluster upgrade in the next two weeks that will change how regex timeouts are handled.

In general though, a regex of this sort is going to be very inefficient. The documentation suggests including additional filters that match the same content to reduce the number of pages that we have to run the regex on. I would try insource:link insource:/\|Link=/ This isn't likely to be that much better, insource:link is pretty generic and matches about 20% of zhwiki content, but it's a better starting point than the whole thing. I hope that can help get you results without the timeouts.

A2093064 updated the task description. (Show Details)Feb 23 2019, 5:41 AM

@EBernhardson Thanks for your explanation. I know how to optimize the regex. I expect to get messages like [[MediaWiki:Cirrussearch-regex-timed-out]] instead of exception message. Description updated.

dcausse added a subscriber: dcausse.EditedFeb 25 2019, 9:27 AM

The regex timeout we allow for the request to elastic does not match the timeout allowed by wmf-config/set-time-limit.php (enabled only for PHP7 if I assume that excimer is loaded only for PHP7).
So I think multiple things come into play:

  • the recent update to elastic 5.6.14 (but the source_regex handling did not change yet, perhaps elastic have changed how they do it)
  • PHP7 seems to be used because WMFTimeoutException is thrown (PHP7 beta feature enabled?)
  • ExcimerTimer reacts differently than HHVM set_time_limit?
  • We are already out of allowed limits: our curl read timeout is set to 80s for regexes while we allow 60s for GET requests (both HHVM set_time_limit & ExcimerTimer)
  • assuming timeouts elastic side are accurate it means that 20s (shard timeout) is not aggressive enough to return a response to the client in 60s.

ExcimerTimer reacts differently than HHVM set_time_limit?

As I understand it, there is a notable difference between PHP 7 timeouts and HHVM timeouts (if I recall correctly, PHP 7 measures CPU-time only, whereas HHVM counts wall-clock time). @tstarling added use of Excimer timeouts to approximate the HHVM-like behaviour under PHP 7.

Joe added a subscriber: Joe.Feb 26 2019, 7:12 AM

ExcimerTimer reacts differently than HHVM set_time_limit?

As I understand it, there is a notable difference between PHP 7 timeouts and HHVM timeouts (if I recall correctly, PHP 7 measures CPU-time only, whereas HHVM counts wall-clock time). @tstarling added use of Excimer timeouts to approximate the HHVM-like behaviour under PHP 7.

To be more precise, say you call set_time_limit(N):

  • HHVM counts wall clock only. So if N seconds passed from when the thread started processing a request, a timeout is sent.
  • The Zend Engine only counts CPU seconds consumed by itself. This means that if you spend N-1 seconds doing a query to mysql, which is mediated by a C library, the Zend engine will only see 1 second of elapsed time
  • Excimer is able to count the wall clock time and to timeout according to it, with a caveat - it cannot interrupt the execution of a C function. So it will effectively timeout after N seconds unless we're blocked in a long-term C library call (like - say - curl). In those cases, php-fpm has a hard limit at N+something seconds (set in ini) that will kill the request anyways.
dcausse claimed this task.Feb 26 2019, 10:09 AM

Thanks for the information, it means that I can exclude PHP7 vs HHVM here as the intent is to make these timeouts very similar.

With HHVM the curl request will be timed out on set_time_limit which allows Cirrus to raise its 'Search backend error' but with ExcimerTimer we send WMFTimeout because the application code does not have a chance to treat the curl request timeout.
I will adjust our request timeouts accordingly to have more chances to hit the request timeouts while ExcimerTimer still gives us some time to print our error to the user instead of WMFTimeout.

As for the expected result in this bug (partial results and MediaWiki:Cirrussearch-regex-timed-out) something probably changed elastic side that prevents us to display partial results, I'll focus on this.

dcausse triaged this task as High priority.Feb 26 2019, 10:10 AM

Raising to high as it may leave long running threads elasticsearch side.

Change 492989 had a related patch set uploaded (by DCausse; owner: DCausse):
[operations/mediawiki-config@master] [cirrus] decrease regex timeouts by 25%

https://gerrit.wikimedia.org/r/492989

Change 492989 merged by jenkins-bot:
[operations/mediawiki-config@master] [cirrus] decrease regex timeouts by 25% and drop timeout hack

https://gerrit.wikimedia.org/r/492989

moving to done as it seems to be fixed by decreasing timeouts and removing our timeout hack for regexes.

debt closed this task as Resolved.Mar 8 2019, 5:54 PM
Liuxinyu970226 moved this task from Backlog to Closed on the Chinese-Sites board.Mar 10 2019, 3:19 AM
mmodell changed the subtype of this task from "Task" to "Production Error".Aug 28 2019, 11:07 PM