Page MenuHomePhabricator

Collect test cases for phabricator search
Closed, ResolvedPublic

Description

In order to evaluate ElasticSearch and compare it to the InnoDB search engine in Phabricator, we need to collect some test cases and then run them against an index in each search engine to compare results.

I will add bug reports about problematic queries as subtasks under this task and I welcome suggestions in comments about search queries which currently return bad results (or no results when that is clearly wrong)

Related Objects

Event Timeline

is there a way we can copy projects and tasks into phab-01 to test?

is there a way we can copy projects and tasks into phab-01 to test?

Let's keep this task solely on collecting the test cases, not where we test them nor how we create those test instances (or use production Phab).

@Paladox: maybe, I'm looking into it. You can create a separate task for that if you'd like.

Test case: Searching for "databases", "Open and Closed", "Task", "Author: @scfc" (https://phabricator.wikimedia.org/search/query/dxp3d.8.DDWY/#R) lists nine tasks, but not 91231. (I've left out "T" to not trigger any reindexes a mention might cause.)

@matmarex hi, clicking on https://phabricator.wikimedia.org/search/query/3CkjwO6wu16r/#R I see T94099: Rewrite mw.FormDataTransport.js to make use of mw.Api

and number 10 in the search results.

Elasticsearch gets this one better than mysql - T94099 is the first result: https://phabricator.wikimedia.org/search/query/3CkjwO6wu16r/?elastic=1

You added the project to the tag, without the project it is on the third results

https://phabricator.wikimedia.org/search/query/W8K8LawjrQur/?elastic=1

so still better then MySQL.

@MZMcBride was complaining about search today on IRC. He noted that it seems that some tasks are simply not indexed at all, but performing any action on them causes them to get indexed. This could explain why both of the issues I reported above magically fixed themselves as soon as I mentioned (linked) the tasks here.

The problem with finding test cases is that the search index is incomplete. When someone subscribes to, comments on, or even mentions a Maniphest task, that action to the task seems to trigger the task to become part of the search index. This makes search results very inconsistent and confusing.

Someone should fix the search index. I filed T153603 about this. I don't care if a person has to go through every public task and comment on it to trigger its insertion into the search index.

Well we should be doing a reindex soon as part of the phabricator upgrade since upstream have fixed support for innodb fulltext search + the upgrade requires a reindex anyways.

@MZMcBride was complaining about search today on IRC. He noted that it seems that some tasks are simply not indexed at all, but performing any action on them causes them to get indexed. This could explain why both of the issues I reported above magically fixed themselves as soon as I mentioned (linked) the tasks here.

Right, as Evan said on the parent task:

(Note that mentioning the PHID of the task may also trigger mentions and reindexing, so no one should mention either T... or PHID-TASK-... in plain text for that task until the query is run, if the results are to be diagnostically useful.)

I'm wondering is this resolved? As we have switched to elasticsearch. We may just want to create a tracking task for making improvements to elasticsearch.

I would still welcome any test cases for problematic search phrases, however, I do think this can be closed for now.