Page MenuHomePhabricator

No results for specific string in Global Phab Search, while Advanced Search lists expected results (due to ElasticSearch for Global Search vs Ferret for Maniphest Search)?
Closed, ResolvedPublic

Description

When searching a PHP error message that includes a namespace (e.g. Elastica\ResultSet) phabricator should be able to find pages that mention these namespaces.

Before creating a ticket (after seeing an error in logstash) I usually try to find if the error was already reported, for this I copy the error message which sometimes include PHP namspaces with backslahes.

Searching for Elastica\ResultSet with the phabricator tag should find this ticket:

  • actual behavior: nothing is found
  • expected behavior: this ticket should be found
  • workaround: search for Elastica ResultSet

Event Timeline

Could you share a query URL to reproduce (Phabricator Maniphest Search vs Phabricator Global Search, for example) and a task ID that you expected to show up but does not?

Searching for "Elastica\ResultSet" in https://phabricator.wikimedia.org/maniphest/query/jcVWVC5JHVHv currently lists three results, but not sure if that's what you expected.

See https://www.mediawiki.org/wiki/Phabricator/Help#Searching_for_items for search options.

This is using the simple search form (triggered from the search box top-right): https://phabricator.wikimedia.org/search/query/ZRnM6OqnN6iE/#R (this current task should be present in the search results)
Thanks for pointing at the advanced search form I was not aware of it.

Thanks. Indeed, searching for "Elastica\ResultSet" in the general search at https://phabricator.wikimedia.org/search/query/7YYyv7Zx5ju0/#R lists no results while using the Maniphest search there are results at https://phabricator.wikimedia.org/maniphest/query/jcVWVC5JHVHv/
I'd say that should not happen.

Aklapper renamed this task from Search for PHP namespaces should find expected results to No results for specific string in Global Phab Search, while Advanced Search lists expected results.May 22 2019, 3:52 PM

I believe this install is currently configured to use ElasticSearch for global search. Search in other interfaces (including Maniphest advanced search) is powered by Phabricator's builtin engine, "Ferret".

A possible remedy for this issue is to disable ElasticSearch (so global search also uses the Ferret engine), see if any issues arise, and then decommission it if nothing crops up. My expectation is that Ferret is now generally better at finding search results than ElasticSearch (in the context of Phabricator, that is), and the Ferret index is always built and always used for some search operations anyway, so you can't really get away from it.

Although we could likely improve the ElasticSearch integration, Ferret currently has a lot of features which the ElasticSearch integration does not support (even though ElasticSearch itself may support these features), and ElasticSearch has no advantages I'm aware of over Ferret except theoretical scalability.

One example of feature disparity is stemming: at time of writing, searching for expecting result finds this task in Maniphest search (by matching query term "expecting" against title term "expected", and query term "result" against title term "results") but not in global search via ElasticSearch. (Although now that I've written this comment, both engines will find this task, so this experiment won't be repeatable. But you can search for other English-language term variations and observe that Maniphest can match plurals and conjugations against text while global search can not.)

Another is substring search: searching for ~tica\Res in Maniphest finds this task (and T99755, which contains the string Elastica\Response) but neither are found by global search, since the substring operator isn't currently supported in ElasticSearch.

Although ElasticSearch itself can likely support these features (although substring search seems a bit tricky/involved), Ferret appears to basically work well and doesn't require installs to run any external services, so it's hard to motivate making improvements to the ElasticSearch integration. (Historically, the ElasticSearch integration has also been difficult for users to set up, configure, and maintain -- while Ferret basically just works out of the box.) Ferret also has the major advantage that it's just a bunch of MySQL primitives so it can express complex queries with non-fulltext constraints completely in MySQL, so we don't need to intersect "fulltext" and "non-fulltext" results at the application level or figure out how to represent/export complicated relationships into an external search engine.

Prior to Ferret, Phabricator's builtin search was powered by MySQL FULLTEXT indexes, which were arguably better than nothing but which I had a difficult time making work well for our use case. In this bygone era, there were a lot of good reasons to prefer ElasticSearch over FULLTEXT, but Ferret appears to be good enough on all major dimensions that there's no longer a clear reason to use ElasticSearch. I currently imagine deprecating ElasticSearch support and eventually removing it, although there's no hurry here.

See also https://secure.phabricator.com/T12974 for some context and links.

Aklapper renamed this task from No results for specific string in Global Phab Search, while Advanced Search lists expected results to No results for specific string in Global Phab Search, while Advanced Search lists expected results (due to ElasticSearch for Global Search vs Ferret for Maniphest Search)?.Jul 20 2019, 12:22 PM

I mostly agree with what @epriestley wrote above in T224082#5333571 and I made T230787 to explore switching to ferret.

Although we could likely improve the ElasticSearch integration, Ferret currently has a lot of features which the ElasticSearch integration does not support (even though ElasticSearch itself may support these features), and ElasticSearch has no advantages I'm aware of over Ferret except theoretical scalability.

I think I found one other advantage for elasticsearch, currently, at least in Wikimedia's fork of Phabricator: we support highlighting the text that matched within the body of the document. Ferret only seems to display the title of the matching documents, and although it does highlight matching words, it doesn't show the matching text at all when the match appears in a task description, commit message, comment, etc. That's a fairly significant difference although I suppose it might be possible to extend Ferret to do the same.

Ah, yeah -- I suspect the patch with the forked highlighting/display behavior should carry over to Ferret without significant changes. This is also still behavior I plan to bring upstream in some form, I just wasn't satisfied with my initial stab at it back in the day.

mmodell claimed this task.

This specific issue is resolved by escaping the backslash in the elasticsearch storage engine. I intend to revisit this as it looks like ferret is the right choice going forward. Thanks to @epriestley for the valuable feedback on this! It really is appreciated.