Page MenuHomePhabricator

Maniphest advanced search broken when keyword field is provided.
Closed, ResolvedPublic

Description

https://phabricator.wikimedia.org/maniphest/query/uQ35179D8vC0/#R ("workboard", open, phabricator OR phabricator.org).

Request aborted by debug time limit after 30 seconds.

STACK TRACE
PhabricatorSearchEngineElastic.php:383 PhabricatorStartup->onDebugTick()
PhabricatorSearchEngineElastic.php:216 PhabricatorSearchEngineElastic->executeRequest()
ManiphestTaskQuery.php:511 PhabricatorSearchEngineElastic->executeSearch()
ManiphestTaskQuery.php:216 ManiphestTaskQuery->buildFullTextWhereClause()
PhabricatorPolicyAwareQuery.php:227 ManiphestTaskQuery->loadPage()
PhabricatorCursorPagedPolicyAwareQuery.php:166 PhabricatorPolicyAwareQuery->execute()
PhabricatorApplicationSearchEngine.php:622 PhabricatorCursorPagedPolicyAwareQuery->executeWithCursorPager()
PhabricatorApplicationSearchController.php:218 PhabricatorApplicationSearchEngine->executeQuery()
PhabricatorApplicationSearchController.php:80 PhabricatorApplicationSearchController->processSearchRequest()
AphrontController.php:33 PhabricatorApplicationSearchController->processRequest()
AphrontController.php:69 AphrontController->handleRequest()
ManiphestTaskListController.php:24 AphrontController->delegateToController()
AphrontController.php:33 ManiphestTaskListController->processRequest()
AphrontApplicationConfiguration.php:196 AphrontController->handleRequest()
AphrontApplicationConfiguration.php:121 AphrontApplicationConfiguration->processRequest()
index.php:19 AphrontApplicationConfiguration->runHTTPRequest()

Try with any keyword in the field, you get the same results as no key (if it manages to complete the request before timeout.)

Query debug trace shows that the keyword isn't passed to elasticsearch at all:

Screenshot_from_2015-02-11_10:20:37.png (387×1 px, 33 KB)

Event Timeline

Nemo_bis raised the priority of this task from to Needs Triage.
Nemo_bis updated the task description. (Show Details)
Nemo_bis added a project: Phabricator.
Nemo_bis subscribed.

It loads for me (tried twice) but took ~22 seconds.

Aklapper renamed this task from Fatal upon maniphest search in a component to Fatal error (30 seconds timeout) upon certain maniphest search in a component.Jan 28 2015, 7:38 PM
Aklapper set Security to None.

Confirming the problem (not logged in).

Aklapper renamed this task from Fatal error (30 seconds timeout) upon certain maniphest search in a component to Fatal error (30 seconds timeout) upon certain maniphest search in a component when not logged in.Jan 29 2015, 5:39 PM

I get the error while being logged in (Chrome).

Best person to ask is @mmodell as he has had luck breaking down their queries before.

I get this a lot, logged-in. The top of the stack is sometimes slightly different:

STACK TRACE
utils.php:235 PhabricatorStartup->onDebugTick()
PhabricatorSearchEngineElastic.php:232 ipull()
ManiphestTaskQuery.php:511 PhabricatorSearchEngineElastic->executeSearch()
ManiphestTaskQuery.php:216 ManiphestTaskQuery->buildFullTextWhereClause()
PhabricatorPolicyAwareQuery.php:227 ManiphestTaskQuery->loadPage()
PhabricatorCursorPagedPolicyAwareQuery.php:166 PhabricatorPolicyAwareQuery->execute()
PhabricatorApplicationSearchEngine.php:622 PhabricatorCursorPagedPolicyAwareQuery->executeWithCursorPager()
PhabricatorApplicationSearchController.php:218 PhabricatorApplicationSearchEngine->executeQuery()
PhabricatorApplicationSearchController.php:80 PhabricatorApplicationSearchController->processSearchRequest()
AphrontController.php:33 PhabricatorApplicationSearchController->processRequest()
AphrontController.php:69 AphrontController->handleRequest()
ManiphestTaskListController.php:24 AphrontController->delegateToController()
AphrontController.php:33 ManiphestTaskListController->processRequest()
AphrontApplicationConfiguration.php:196 AphrontController->handleRequest()
AphrontApplicationConfiguration.php:121 AphrontApplicationConfiguration->processRequest()
index.php:19 AphrontApplicationConfiguration->runHTTPRequest()

I also get nginx's HTTP 504 error page and the standard WMF error page ("Our servers are currently experiencing a technical problem.") sometimes.

Dies on this line. Apparently Elasticsearch returns a huge response and Phabricator runs over the PHP time limit while trying to JSON-decode it.

Tgr raised the priority of this task from Low to High.Jan 30 2015, 8:02 AM

This happens with an annoyingly high frequency.

Also, details of the search don't seem to matter much. It happens both on searches which are very generic and ones which should have few results (because they are limited to a small project). Other times searches work but are very slow (a search with 5 results just took me ~20 seconds).

@mmodell: Could you take a look at this?
Is this just about increasing some timeout setting somewhere, or is this more complicated?

I think this is currently the main blocker for T75854: adjusting priority accordingly. Nik, I think this might be the self-contained issue you were looking for, worth spending your time on.

Nemo_bis renamed this task from Fatal error (30 seconds timeout) upon certain maniphest search in a component when not logged in to Fatal error (30 seconds timeout) upon part of maniphest searches.Feb 6 2015, 6:48 AM
Nemo_bis raised the priority of this task from High to Unbreak Now!.
Tgr renamed this task from Fatal error (30 seconds timeout) upon part of maniphest searches to Fatal error (30 seconds timeout) for some maniphest search queries.Feb 6 2015, 6:50 AM
Aklapper lowered the priority of this task from Unbreak Now! to High.Feb 6 2015, 1:00 PM

This does not seem to affect normal search, so the workaround is to just use that.

Here's my analysis of what's happening with this particular search query:

  1. Because of some sillyness in phabricator's elasticsearch indexing logic, almost every task in elasticsearch is associated with the term workboard. That is, when you move a task to a new workboard column, it gets a transaction entry for the column change and this is apparently indexed in such a way that it matches a textual search for "workboard".
  2. This results in a huge elasticsearch result set, because apparently phabricator doesn't do pagination on the elasticsearch side, it pares the results down after parsing the entire huge result from elasticsearch.
  3. To add stupid on top of idiotic, phabrcator seems to be hitting mysql with a join query that consists of a huge list of PHIDs. I think this is needed so that it can sort the results from elasticsearch according to your advanced search criteria (it gets the status and priority from mysql and merges this with the elasticsearch results, essentially.)
  4. Only after all of the above can the results be trimmed down to size and output in nice little page-sized chunks. Unfortunately this isn't always possible as it takes ~20 seconds for each of (2.) and (3.) above, this exceeds the 30 second runtime limit and the query fails.

Conclusion: maniphest advanced search doesn't scale when the elasticsearch result is large. There isn't usable equivalent to the SQL 'limit' clause produce strictly correct results for all conceivable queries.

If similar thing is happening on a query that is not due to a large elasticsearch result, then that's a different bug. This one seems rather difficult to solve.

Possible Workarounds:

  1. Don't search for workboard ;)
  2. Patch phabricator so that it doesn't index anything for transactions like "x moved this task to y on the z workboard".

Neither of these help for other slow queries which are likely to exist. Thankfully, I don't think
there will be many examples which are as severe as this one.

Thanks for the input man. I think this may be another indicator that without a concerted effort on our behalf to support ES we are better off with the stock mysql search.

Searching for Phabricator OR #phabricator.org, it returns 7 results in 500ms. If I add "workboard" to the freetext field, it takes 20s to return the same 7 hits. Does that trigger a different search mode? Or does Phabricator apply project rules in the SQL step?

Also, is there a theoretical problem with doing pagination and sorting in ES, is that just something that's not implmeneted in the ES adapter in Phabricator? It sounds exactly like the kind of task ES is good at.

ok on further investigation, I think that keyword search is simply broken. The search term isn't being passed on to elasticsearch so es is returning everything it's got. Try it with other more unique keywords and you'll find they are all rather slow... and produce the same search results.

Still doesn't explain why adding a keyword search actually makes the query slower, though. Does it trigger something expensive on the SQL side? Is the ES query different in some subtle way in the presence of a keyword?

Thanks a lot Mukunda for analyzing what's happening!

Here's my analysis of what's happening with this particular search query:

Are 1. and 3. tasks to upstream?

Aklapper renamed this task from Fatal error (30 seconds timeout) for some maniphest search queries to Fatal error (30 seconds timeout) for some maniphest search queries (with ElasticSearch backend).Feb 11 2015, 12:24 PM
chasemp claimed this task.

takes a minute but finishes successfully now for me logged in and out

takes a minute but finishes successfully now for me logged in and out

The search may complete some of the time but it doesn't return proper results.

In T87739#1030509, @Tgr wrote:

Still doesn't explain why adding a keyword search actually makes the query slower, though. Does it trigger something expensive on the SQL side? Is the ES query different in some subtle way in the presence of a keyword?

When the keyword isn't present it skips the ES query entirely.

Furthermore, it looks like it's not even passing the keyword to elasticsearch, so every single search (from advanced search, with keyword field non-empty) is returning a huge result for no good reason.

Do I need to file a new task for this because the scope is much larger from what was described in this task?

takes a minute but finishes successfully now for me logged in and out

The search may complete some of the time but it doesn't return proper results.

Really?

I get 2 high, 3 normal, 10 low, and 9 needs volunteer. Not doubting just wondering what the proper results look like so I know.

https://phabricator.wikimedia.org/maniphest/query/uQ35179D8vC0/#R

(also just fyi we are not using ES now)

mmodell renamed this task from Fatal error (30 seconds timeout) for some maniphest search queries (with ElasticSearch backend) to Maniphest advanced search broken when keyword field is provided..Feb 11 2015, 4:25 PM
mmodell updated the task description. (Show Details)

Elasticsearch backend is still broken completely, we need to at least upstream the bug.

Not a maniphest search, but searching "results in Wikimedia Phabricator" (without quotes) to test T75854#1035759 and T76273#1031377 (priority for tasks), https://phabricator.wikimedia.org/search/query/8UQh.Jec7xMr/#R , I got:

Request aborted by debug time limit after 30 seconds.

STACK TRACE
AphrontBaseMySQLDatabaseConnection.php:168 PhabricatorStartup->onDebugTick()
queryfx.php:6 AphrontBaseMySQLDatabaseConnection->executeRawQuery()
-:- queryfx()
queryfx.php:16 call_user_func_array()
PhabricatorSearchEngineMySQL.php:298 queryfx_all()
PhabricatorSearchDocumentQuery.php:65 PhabricatorSearchEngineMySQL->executeSearch()
PhabricatorSearchDocumentQuery.php:14 PhabricatorSearchDocumentQuery->loadDocumentPHIDsWithoutPolicyChecks()
PhabricatorPolicyAwareQuery.php:227 PhabricatorSearchDocumentQuery->loadPage()
PhabricatorOffsetPagedQuery.php:46 PhabricatorPolicyAwareQuery->execute()
PhabricatorApplicationSearchEngine.php:620 PhabricatorOffsetPagedQuery->executeWithOffsetPager()
PhabricatorApplicationSearchController.php:218 PhabricatorApplicationSearchEngine->executeQuery()
PhabricatorApplicationSearchController.php:80 PhabricatorApplicationSearchController->processSearchRequest()
AphrontController.php:33 PhabricatorApplicationSearchController->processRequest()
AphrontController.php:69 AphrontController->handleRequest()
PhabricatorSearchController.php:85 AphrontController->delegateToController()
AphrontController.php:33 PhabricatorSearchController->processRequest()
AphrontApplicationConfiguration.php:196 AphrontController->handleRequest()
AphrontApplicationConfiguration.php:121 AphrontApplicationConfiguration->processRequest()
index.php:19 AphrontApplicationConfiguration->runHTTPRequest()

Not a maniphest search, but searching "results in Wikimedia Phabricator" (without quotes) to test T75854#1035759 and T76273#1031377 (priority for tasks), https://phabricator.wikimedia.org/search/query/8UQh.Jec7xMr/#R , I got:

Request aborted by debug time limit after 30 seconds.

STACK TRACE
AphrontBaseMySQLDatabaseConnection.php:168 PhabricatorStartup->onDebugTick()
queryfx.php:6 AphrontBaseMySQLDatabaseConnection->executeRawQuery()
-:- queryfx()
queryfx.php:16 call_user_func_array()
PhabricatorSearchEngineMySQL.php:298 queryfx_all()
PhabricatorSearchDocumentQuery.php:65 PhabricatorSearchEngineMySQL->executeSearch()
PhabricatorSearchDocumentQuery.php:14 PhabricatorSearchDocumentQuery->loadDocumentPHIDsWithoutPolicyChecks()
PhabricatorPolicyAwareQuery.php:227 PhabricatorSearchDocumentQuery->loadPage()
PhabricatorOffsetPagedQuery.php:46 PhabricatorPolicyAwareQuery->execute()
PhabricatorApplicationSearchEngine.php:620 PhabricatorOffsetPagedQuery->executeWithOffsetPager()
PhabricatorApplicationSearchController.php:218 PhabricatorApplicationSearchEngine->executeQuery()
PhabricatorApplicationSearchController.php:80 PhabricatorApplicationSearchController->processSearchRequest()
AphrontController.php:33 PhabricatorApplicationSearchController->processRequest()
AphrontController.php:69 AphrontController->handleRequest()
PhabricatorSearchController.php:85 AphrontController->delegateToController()
AphrontController.php:33 PhabricatorSearchController->processRequest()
AphrontApplicationConfiguration.php:196 AphrontController->handleRequest()
AphrontApplicationConfiguration.php:121 AphrontApplicationConfiguration->processRequest()
index.php:19 AphrontApplicationConfiguration->runHTTPRequest()

Honest to god I redid this search a dozen times this morning and it never timed out on me. There are a few things mysql side we have yet to tweak and I'm wondering if that wouldn't help an intermittent problem.

is this still happening for others?

is this still happening for others?

Nemo's query also times out here (=Europe), 3 out of 3 times.

@Nemo_bis, so I tweaked some timeout parameters. This means the query is still not stellar on responsiveness, but there shouldn't be any timeout window dumping a stack trace. Can you try again?

Elasticsearch backend is still broken completely, we need to at least upstream the bug.

When you said this, the backend was already MySQL. Apart from this, I would like to help upstreaming this bug but, to be honest, I'm pretty confused about what should be reported. Maybe you could take this one upstream?

Elasticsearch backend is still broken completely, we need to at least upstream the bug.

When you said this, the backend was already MySQL. Apart from this, I would like to help upstreaming this bug but, to be honest, I'm pretty confused about what should be reported. Maybe you could take this one upstream?

Was anything reported upstream, or was this a local fix? If the latter, please remove the Phabricator (Upstream) project.