Page MenuHomePhabricator

Quotation marks mask strings from internal search thus excluding them from search results
Open, LowPublic

Description

Setup

  • MediaWiki 1.31.6 (c168a3f) 19. Dez. 2019, 14:28
  • PHP 7.0.33-0+deb9u6 (apache2handler)
  • MariaDB 10.1.41-MariaDB-0+deb9u1
  • ICU 57.1

Issue
If you have things like «HelloWorld» or „HelloWorld“ on your wiki and try to search for HelloWorld you will get no search results at all. Only if you add «HelloWorld» or „HelloWorld“ including the quotation marks you get results. I expect nobody to add quotation marks to the search string. Moreover with mixed occurrences on the wiki «HelloWorld» or HelloWorld one would only get either results.

I believe it will be a great improvement to be able to also find quoted content.

I added "Discovery-Search" as a tag. Please change at your convenience if this is not the correct tag.

Related Objects

Event Timeline

Kghbln renamed this task from Quotation marks masks strings from internal search thus excluding them from results to Quotation marks mask strings from internal search thus excluding them from search results.Jan 22 2020, 3:01 PM
Kghbln updated the task description. (Show Details)

I added "Discovery-Search" as a tag. Please change at your convenience if this is not the correct tag.

@Kghbln: Does your wiki use the default MediaWiki-Search? Or did you install the CirrusSearch extension?

No it is just using the default SQL search engine. Will add the tag accordingly. Thanks for asking back.

@dcausse Please excuse me for pinging you directly. This issue is bugging me a bit. Thus I would like to get a quick assessment if this is expected behavior or actually a bug. I suspect the latter, but one never knows ...

I'd consider this a bug indeed, I suspect the tokenization algorithm of the default search backend to be quite limited by not being able to properly discard punctuation.

Thanks a lot for your assessment. Much appreciated.Given that Cirrus needs quite some resources to run it will be great to have a fix for the issue as some point to allows smaller user a better experience.