Page MenuHomePhabricator

CirrusSearch fails when $wgNamespacesToBeSearchedDefault is set
Closed, InvalidPublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

  • Upload a PDF document to a wiki, where CirrusSearch is set up
  • Search for a search term available in the PDF. There will be no results, since the File: namespace will not be searched by default
  • Manually select Namespace "File:". Now the PDF file will show in the search results as it should.

The same process fails when you set $wgNamespacesToBeSearchedDefault

  • add $wgNamespacesToBeSearchedDefault[NS_FILE]=true; in LocalSettingsphp
  • try the exact same search again. You will see that the File namespace is correctly set. However NO result is being displayed.
  • the URL created by the search matches the one created before with manual selection of the FILE namespace
  • if you remove the setting and reload the page, the result is there again

So I guess this is some kind of incompatibility of Cirrus search with this option.

My setup:
MediaWiki: 1.35.8
PHP 7.4.3-4ubuntu2.18 (fpm-fcgi)
MariaDB 10.3.38-MariaDB-0ubuntu0.20.04.1
ICU 66.1
Elasticsearch 6.8.23
CirrusSearch 6.5.4 (73cc125) 07:18, 13. Jun. 2022
Elastica 6.1.3 (ea8d452) 07:24, 23. Mai 2022

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

@Krabina after setting a namespace in $wgNamespacesToBeSearchedDefault CirrusSearch will start treating this namespace as a content namespace, meaning that indexed documents have to be moved from the general index the content.
So one way to solve the index is either:

  • reindexing everything from scratch with UpdateSearchIndex.php --startOver and then ForceSearchIndex.php (see the option 1 of the Upgrading section in the README file)
    • main drawback is that users will search an incomplete index while the maintenance scripts are running
  • Saneitize the index with extensions/CirrusSearch/maintenance/Saneitize.php
    • might be slower than the first approach but the advantage is that it is keeping the index populated and just tries to fix inconsistencies

We should probably add a note in the documentation about this.

Change 929393 had a related patch set uploaded (by DCausse; author: DCausse):

[mediawiki/extensions/CirrusSearch@master] Add doc for wgNamespacesToBeSearchedDefault

https://gerrit.wikimedia.org/r/929393

Change 929393 merged by jenkins-bot:

[mediawiki/extensions/CirrusSearch@master] Add doc for wgNamespacesToBeSearchedDefault

https://gerrit.wikimedia.org/r/929393

@Krabina I'm tentatively closing but please feel to re-open if the procedure mentioned in T337328#8903136 does not work for you.

Thank you. Everything worked great!