Page MenuHomePhabricator

Global fulltext search results miss some tasks
Closed, ResolvedPublicBUG REPORT

Description

I'm not able to find a specific Task using the Phabricator search form.

Description of the problem

Searching error in Wikimedia Phabricator and setting a Tag, I cannot find a specific result. Live example:

https://phabricator.wikimedia.org/search/query/ULASybUwkKer/

{F35646873}

This result is not listed:

Additional notes

Note that searching "kawabonga" in phab.wmflabs.org shows a Task with "wakabonga" in the title.

In this case, it works but I noticed that the matching text is not underlined. Live example:

https://phab.wmflabs.org/search/query/XErADOQikRq0/#R

Screenshot_2022_03_31_102240.png (709×1 px, 70 KB)


Note that searching "stupid" in another Phabricator shows a Task with "stupid" in the title.

In this case, it works and I noticed that the matching text is underlined. Live example:

https://gitpull.it/search/query/YBLujDebuzTP/#R

Screenshot_2022_03_30_155251.png (718×1 px, 57 KB)


I've read the official Phabricator search documentation and I've also tried title:error, title:~error, core:error and other weird combinations without understanding why Wikimedia Phabricator does not work when using the keyword "error" to find the Task T304193.

So it seems a problem in Wikimedia Phabricator (production) and not Phabricator itself.

Event Timeline

Aklapper edited projects, added Phabricator (Search); removed Phabricator.

Indeed; the proper Maniphest search at https://phabricator.wikimedia.org/maniphest/query/IReCdyJDn.xF/#R also does not list any results

valerio.bozzolan renamed this task from Cannot search by Task title in some cases to Cannot find some Tasks using simple search.Oct 28 2022, 9:37 AM
valerio.bozzolan updated the task description. (Show Details)
hashar subscribed.

Elasticsearch since Phabricator uses that as backend:

hieradata/role/eqiad/phabricator.yaml
phabricator_cluster_search:
  - type: 'elasticsearch'
    path: '/phabricator'
    port: 9243
    version: 5
    hosts:
      - protocol: 'https'
        host: 'search.svc.eqiad.wmnet'
        roles:
          read: true
          write: true

We had a task to verify compatibility with ElasticSearch 7.10 then T303445#7807237 states Phabricator uses MySQL. From https://phabricator.wikimedia.org/config/cluster/search/ :

phabricator_search_servers.png (326×535 px, 24 KB)

@valerio.bozzolan: Did you intentionally set yourself as task assignee?

Aklapper changed the subtype of this task from "Task" to "Bug Report".Apr 25 2023, 9:12 AM

Decreasing the scope of that query to https://phabricator.wikimedia.org/search/query/aYltXfrCu2Iv/#R, I get four tasks, missing T304193.

The fulltext search code executed first runs
SELECT ngram FROM maniphest_task_fngrams_common WHERE ngram IN (' er', 'err', 'or ', 'ror', 'rro');
(note the common suffix) which returns three results: err, or, rro.

It seems these results get excluded from the ftngrams of the actual followup search, as the following query only includes the remaining two ngrams not found in the previous step.
(And if I manually replace those two ngrams with the three "common" ngrams I get zero results.)

SELECT task.*, IF(ft_rank.termCorpus LIKE '% error %', 2, 0) + IF(ft_rank.normalCorpus LIKE '% error %', 1, 0) + 0 AS
_ft_rank, ft_doc.epochCreated AS _ft_epochCreated, ft_doc.epochModified AS _ft_epochModified, ft_rank.rawCorpus AS
rawCorpus
FROM maniphest_task task  
JOIN edge edgelogic_ancestor_41 ON task.phid = edgelogic_ancestor_41.src AND edgelogic_ancestor_41.type = 41 AND edgelogic_ancestor_41.dst IN ('PHID-PROJ-gzlidnmqz3ztdvx2nubo')
JOIN maniphest_task_fdocument ft_doc ON ft_doc.objectPHID = task.phid 
JOIN maniphest_task_fngrams ftngram_1 ON ftngram_1.documentID = ft_doc.id AND ftngram_1.ngram = ' er'
JOIN maniphest_task_fngrams ftngram_2 ON ftngram_2.documentID = ft_doc.id AND ftngram_2.ngram = 'ror'
JOIN maniphest_task_ffield ftfield_1 ON ft_doc.id = ftfield_1.documentID AND ftfield_1.fieldKey = 'titl'
LEFT JOIN maniphest_task_ffield ft_rank ON ft_doc.id = ft_rank.documentID AND ft_rank.fieldKey = 'titl'
WHERE (((ftfield_1.termCorpus LIKE '% error %')) OR ((ftfield_1.normalCorpus LIKE '% error %')))
GROUP BY task.phid
ORDER BY _ft_rank DESC, _ft_epochModified DESC, task.id DESC
LIMIT 101;

Rewriting that query to
SELECT task.id AS taskId, ft_doc.*, ftngram.* FROM maniphest_task_fngrams ftngram INNER JOIN maniphest_task_fdocument ft_doc ON ftngram.documentID = ft_doc.id INNER JOIN maniphest_task task ON ft_doc.objectPHID = task.phid INNER JOIN edge e ON task.phid = e.src AND e.type = 41 AND e.dst = "PHID-PROJ-gzlidnmqz3ztdvx2nubo" INNER JOIN maniphest_task_ffield ftfield ON ft_doc.id = ftfield.documentID AND ftfield.fieldKey = 'titl' WHERE (ftngram.ngram = " er" OR ftngram.ngram = "ror") AND ((ftfield.termCorpus LIKE '% error %') OR (ftfield.normalCorpus LIKE '% error %')) ORDER BY task.id;
which returns four distinct task results.
T304193 is not among them.

Aklapper renamed this task from Cannot find some Tasks using simple search to Global fulltext search results miss some tasks.Jun 30 2025, 11:00 AM
Aklapper claimed this task.

Cannot reproduce anymore after fixing T398197 by manually reindexing 2100 tasks.

(The missing task mentioned in this very ticket had the "Other Assignee" field set, our instance was configured to index that Other Assignee field (which made no sense, changed now), and upstream indexer code crashed on indexing that field being an array while upstream code only supports a string (filed a ticket/patch upstream).)