Page MenuHomePhabricator

Phabricator search degraded in quality for almost any query
Closed, ResolvedPublic

Description

Examples:

This is so bad, that we would move out of phabricator if we couldn't search with google search.

Event Timeline

mmodell created this task.Dec 5 2017, 12:44 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
mmodell triaged this task as High priority.Dec 5 2017, 12:46 PM
jcrespo renamed this task from Phabricator search returns low quality results for "db1110" and similar queries to Phabricator search degraded in quality for almost any query.Dec 5 2017, 12:48 PM
jcrespo raised the priority of this task from High to Needs Triage.
jcrespo triaged this task as High priority.
jcrespo updated the task description. (Show Details)
jcrespo updated the task description. (Show Details)
jcrespo updated the task description. (Show Details)Dec 5 2017, 12:52 PM
jcrespo added a comment.EditedDec 5 2017, 12:56 PM

"search phabricator" is at the end of the 3rd page, which means there are 50-100 results before it: https://phabricator.wikimedia.org/search/query/_ExISL0kCduK/#R Tecnically it finds it, but let's be honest, that is not very useful.

I understand things are complex, I am not asking for an immediate fix, but maybe we can fail back to the (bad) mysql-based search while you work on it? It is just an idea.

In other words, "fix" is not as important, IMHO, as:

  • Test: if upgrades can break phab regularly, lets run some production-level unit test after upgrades
  • Monitor: let's set some icinga monitoring to check for degradation so owners learn about issues before anyone else
  • Workaround: have mysql as a failback to be temporarily enabled if things go badly

This is just a suggestion, which I am happy to help with, and not a formal proposal.

I just tested mysql backend on my account, and it doesn't seem to work- so the issue is most likely at code level or data gathering/tokenization, not elasticsearch itself.

@jcrespo: Indeed, something changed with the last update which broke tokenization. I'm working on it, sorry it's taking so long to figure out.

@jcrespo: Your suggestions are all good ones, however, it's hard to tell what is happening currently in production. I need to add some debug logging just to see what is actually happening - it doesn't appear to be an issue with the search index so falling back to mysql index wouldn't really help (and that index would be horribly out of date)

I agree that tests and monitoring are important going forward. I'll try to come up with a strategy for those.

mmodell added a comment.EditedDec 5 2017, 11:41 PM

Ok I optimized the ngram index and reindexed maniphest tasks with bin/search index --type task --force and now results seem much improved.

Note: reindex is still running.