Page MenuHomePhabricator

Wildcard characters not working in CirrusSearch when using uppercase characters
Closed, ResolvedPublic

Description

Wildcard characters (like * ) are not working in CirrusSearch.

For example, https://www.mediawiki.org/w/index.php?search=Media%2Aiki&title=Special:Search&go=Go&searchToken=emr2elsxjgizhqzasysp76hkb (query "Media*iki") should give all pages containing MediaWiki, but does not give any results

https://en.wikipedia.org/w/index.php?search=Wiki%2Aedia&title=Special:Search&go=Go&searchToken=dx8uhwywemae0662p4ggeens9 (query "Wiki*edia") should give all pages containing Wikipedia, but does not give any results again.

This seems to be rather critical as it negatively affects capacity to use CirrusSearch

Event Timeline

NickK created this task.Mar 18 2017, 4:22 PM
Restricted Application added projects: Discovery, Discovery-Search. · View Herald TranscriptMar 18 2017, 4:22 PM
Restricted Application added subscribers: Base, Aklapper. · View Herald Transcript
NickK triaged this task as High priority.Mar 18 2017, 4:25 PM

Wildcard queries are not sent to the analyzers which means that the words in your query are not lower-cased and thus won't match any terms in the inverted index.
I agree that this is annoying and very confusing I will investigate to see if it's possible to force the analysis phase on wildcard tokens.
In the meantime I'd suggest to always use lowercase characters as a workaround:

dcausse renamed this task from Wildcard characters not working in CirrusSearch to Wildcard characters not working in CirrusSearch when using uppercase characters.Mar 18 2017, 4:41 PM

I'm curious to know if it's a recent regression that may have been caused by elasticsearch 5 migration. I'm tempted to say yes since it's the second bug report in the few days following elastic5 roll out.
I'll double check a restore the previous behavior if possible.

I'm curious to know if it's a recent regression that may have been caused by elasticsearch 5 migration. I'm tempted to say yes since it's the second bug report in the few days following elastic5 roll out.

Yes, it is clearly a recent regression. I am currently using wildcard searches with uppercase charachters (looking for different spelling of people's last names more precisely) on a near-daily basis, and they worked correctly until last week.

Deskana added a subscriber: Deskana.

Pulling this in to the current sprint to further investigate what is happening.

This dropped in priority due to other work and ongoing discussions. I've bumped it back up to the top of our list to investigate.

dcausse changed the task status from Open to Stalled.EditedApr 24 2017, 3:12 PM

Unfortunately this seems to be a bug upstream (https://github.com/elastic/elasticsearch/issues/23620)
Reading the ticket the workaround I had in mind won't be sufficient (analyze_wildcard=true).
My local install (elastic 5.2.2) is not affected by the bug.
Marking as stalled for now, basically this bug will be fixed when elastic 5.3.x is installed.

debt added a subscriber: debt.

This will be resolved in the next version of ElasticSearch, moving to backlog for now

debt moved this task from needs triage to Up Next on the Discovery-Search board.Apr 27 2017, 5:09 PM

Moving to be with the epic (T163703)

dcausse closed this task as Resolved.Jun 12 2017, 7:26 AM

The new elasticsearch version is installed, confirmed that the bug is fixed.