Page MenuHomePhabricator

CirrusSearch does not index redirects, removing them from autosuggest in the search box
Closed, DuplicatePublic

Description

Sometime in the last month, we noticed on our wiki (1.26) that CirrusSearch (1.26 tag) stopped indexing redirects. For example, on Wikipedia, you can see that the redirect "Einstein" is represented by an empty document in ElasticSearch:

https://en.wikipedia.org/w/index.php?title=Einstein&redirect=no&action=cirrusdump

The output I see is:

[]

As a result, if you type "Einstei" (removing the last letter) in the Wikipedia search box, you do not get a completion of "Einstein."

Earlier this year, this behavior did not occur, and we don't know what changed. Is this a bug or an intentional feature? If it's a feature, is there a way to configure CirrusSearch so redirects get autocompleted by the wiki search box?

Here is some more data. When we create an article (MyArticle) and a redirect (MyRedirect) on our wiki, here are the MediaWiki jobs that get queued:

cirrusSearchLinksUpdatePrioritized MyArticle addedLinks=array(0) removedLinks=array(0) prioritize=1 (id=8537418,timestamp=20160926165549) status=unclaimed
cirrusSearchLinksUpdatePrioritized MyRedirect addedLinks=["Home"] removedLinks=array(0) prioritize=1 (id=8537419,timestamp=20160926165601) status=unclaimed

When the jobs run, here is the output produced:

cirrusSearchLinksUpdatePrioritized MyRedirect addedLinks=["Home"] removedLinks=array(0) prioritize=1 (id=8537419,timestamp=20160926165601) STARTING
cirrusSearchLinksUpdatePrioritized MyRedirect addedLinks=["Home"] removedLinks=array(0) prioritize=1 (id=8537419,timestamp=20160926165601) t=401 good
cirrusSearchIncomingLinkCount Home (id=8537420,timestamp=20160926165631) STARTING
cirrusSearchIncomingLinkCount Home (id=8537420,timestamp=20160926165631) t=27 good

cirrusSearchLinksUpdatePrioritized MyArticle addedLinks=array(0) removedLinks=array(0) prioritize=1 (id=8537418,timestamp=20160926165549) STARTING
cirrusSearchLinksUpdatePrioritized MyArticle addedLinks=array(0) removedLinks=array(0) prioritize=1 (id=8537418,timestamp=20160926165549) t=22 good

I tried modifying some CirrusSearch parameters but could not affect the autocompletion behavior:

  • Increasing $wgCirrusSearchWeights['redirect']
  • Increasing $wgCirrusSearchPrefixWeights['redirect'] and $wgCirrusSearchPrefixWeights['redirect_asciifolding']

Thank you very much.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Here's another oddity. Create an article "Foo Bar Blat" in the main namespace that's a redirect to a User page, say:

#REDIRECT [[User:Smith]]

Now go to the wiki search box and type:

User:Foo Bar

It will offer User:Foo Bar Blat as a search suggestion, which is a title that doesn't exist.

debt triaged this task as Medium priority.Sep 26 2016, 10:04 PM
debt moved this task from needs triage to Up Next on the Discovery-Search board.
debt added subscribers: EBernhardson, Smalyshev, dcausse and 2 others.

Let's go ahead and take a quick look at this and see if we've caused the issue.

CirrusSearch does index redirects, but not as a direct page, instead the redirects are a property of the document they redirect to, so instead of:

https://en.wikipedia.org/w/index.php?title=Einstein&redirect=no&action=cirrusdump

Look at:

https://en.wikipedia.org/wiki/Albert_Einstein?action=cirrusdump

This has a property redirect which is an array that includes: { title:Einstein, namespace: 0}

Looking at the relevant code that runs prefix searches in REL1_26 branch: https://github.com/wikimedia/mediawiki-extensions-CirrusSearch/blob/REL1_26/includes/Searcher.php#L324-L344

We can see there are two main code paths, one when $wgCirrusSearchPrefixSearchStartsWithAnyWord is true, and the other when it's false. The false condition queries the redirects, but it looks like the true condition does not.

Any chance $wgCirrusSearchPrefixSearchStartsWithAnyWord is set to true on your wiki? In that case can probably work up some patch to make it also check redirects.

Thanks for your quick response!

$wgCirrusSearchPrefixSearchStartsWithAnyWord is set to false on our wiki. I also tried setting it to true (to see what happens), and there was no difference in behavior.

@maiden_taiwan prefix search does not support crossnamespace redirects: creating a page in the main namespace that redirects to another namespace.

Unfortunately due to the way cirrus is designed it's unlikely that we will be able to fix this problem for prefix search.

Please refer to T115756 for more info on this limitation.

Concerning the example you mention on english wikipedia, it's more a scoring issue than a missing redirect: redirect suggestions in autocomplete search are aggressively discounted and the "Einstein" redirect page is ranked very low thus not part of the top10 we display.

Could you identify the kind of redirects you think are not indexed, if it's a redirect to the same namespace then please open the canonical page (with ?action=cirrusDump) to see if the redirect is listed in the redirect json array.
If the redirect is listed in another namespace then I think you've hit the crossnamespace limitation for prefix search.

@dcausse - Thanks for your reply. If cross-namespace redirects are not supported for autocompletion in the search box, then I'm baffled because these redirects were working for years in our wiki search box... for 7,000+ users.

Background story: Our wiki has a batch job that automatically adds redirects to represent each user's full name. For example, if John Smith has user page User:Jsmith, then a nightly script creates redirect John Smith with contents #REDIRECT [[User:Jsmith]]. As a result, the wiki search box autocompletes every employee name in our company (e.g., typing John Sm yields John Smith), turning the search box into a company personnel directory. This mechanism has been working consistently since we installed CirrusSearch in July 2014. It stopped working about a month ago, and we had not upgraded or modified CirrusSearch at that time.

The only possibly relevant change we've made is retiring Brion Vibber's TitleKey extension, because according to the documentation on that page, "This extension provides no benefit if you are using CirrusSearch." The problem begin somewhere around the same time. I reinstalled TitleKey but the problem remains, so this could just be coincidence. Any thoughts on this, or on the mystery in general?

@dcausse - Also to answer your question, if I look at User:Jsmith with ?action=cirrusDump, then yes, the redirect "John Smith" is listed in the JSON:

"redirect":[{"namespace":0,"title":"John Smith"}]

Thinking outside the box: maybe our wiki search box was somehow using the default MediaWiki search engine (mySQL full-text search) for auto-suggest, even though CirrusSearch was working for actual submitted searches. Is that possible? If so, do you know how we might re-configure our wiki to make it happen again?

Solved it! The cause is indeed the TitleKey extension. When it's installed, auto-suggest works for cross-namespace redirects in the wiki search box.

I have not investigated why it works, but it works.

Glad to hear you solved your issue, I'll mark this specific bug as duplicate of T115756 because the root cause is the same limitation in CirrusSearch.

@dcausse - In this ticket, you wrote about crossnamespace redirects support in CirrusSearch:

prefix search [in CirrusSearch] does not support crossnamespace redirects: creating a page in the main namespace that redirects to another namespace.

However, on English Wikipedia, crossnamespace redirects do show up in prefix search. Example: Wikipedia policy is a redirect to Wikipedia:List of policies, and yet somehow, the redirect shows up in prefix search.

wikipedia policy prefix search.png (120×264 px, 2 KB)

Could you please explain why this works on Wikipedia? We really need crossnamespace redirects to show up as search suggestions on our wiki. Thank you very much.

@maiden_taiwan crossnamespace redirects is partially supported by the CompletionSuggester and it's what you see on Wikipedia sites.
The completion suggester differs from the classic prefix search as it works on its own index allowing us to do slightly more complex process at index time. The drawback is that it's not yet realtime, a maintenance script need to be run to refresh suggestions.
If you want to give it a try you'll need recent MW version (one that supports the option $wgCirrusSearchUseCompletionSuggester so at least 1.27 I think, but please double check).

The maintenance script is maintenance/updateSuggesterIndex.php and it must be added to a cron entry at refreshed according to your needs. On wikipedias it's run once a day.
Once you have built the completion suggester index (the first time you run updateSuggesterIndex.php) you can enable it for all you users by setting :

$wgCirrusSearchUseCompletionSuggester = 'yes'

@dcausse - Thank you so much for that information! We'll try it out.