Page MenuHomePhabricator

Redirects from content namespace to non-content namespace are not indexed in completion suggester
Closed, ResolvedPublic

Description

When building the completion suggester we walk over all documents in the content index. It turns out this is not the entirety of the content though, in CirrusSearch redirect's don't get their own articles. Instead redirects are a property of the main article. This means that there are documents in the general index that contain redirects from content namespaces.

We should likely iterate the general index as well, pulling out all documents that have a redirect from a content namespace and index those redirects into the completion suggester

Event Timeline

Restricted Application added a project: Discovery. · View Herald TranscriptMar 10 2016, 10:06 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Deskana triaged this task as High priority.Mar 10 2016, 10:47 PM
Deskana moved this task from in progress to Done on the Discovery-Search (Current work) board.
Deskana moved this task from Done to in progress on the Discovery-Search (Current work) board.

Setting to high priority, as we should definitely work on fixing this as soon as possible if we accidentally broke it.

I'm not entirely sure but I think it's a Cirrus limitation, this problem exists since the beginning : T115756
Cirrus was never able to do proper cross namespace prefixsearch.
Previously the exact title match was able to suggest such pages but now suggestion are not shown when you type the full page title.
The example in T115756 describes the behavior of Global account on meta which is a title in ns 0 that redirects to Help:Unified_login. Before Global account was suggested if you typed the full title but now it's no more the case, maybe something was broken in the exact title match?
Also I think that the completion suggester ignores the ':' char therefor giving the impression to the user that page exists. Fuzzy matches does not help here.
WP:W => We Will Rock, for the completion suggester this query is similar to wp w and tries to fix the typo :/

We should maybe bail out quickly and fallback to prefixsearch if a ':' is found in the query?

My understanding is that even falling back to prefix search will not resolve the issue. Even falling back we will only query the content namespace, since on huwiki WP: is not a namespace, it is a set of pages in NS 0 that start with WP:. Because these pages primarily redirecting non content namespace STD actual redirects live in the general index. This is likely a limitation of CirrusSearch that is not new or created by the completuon suggester.

I might be oversimplifying, but it seems this is something we could fix with the new completion suggester by iterating the general index for redirects that have NS=0 and adding them to the title suggest index?

Change 276703 had a related patch set uploaded (by DCausse):
completionSearch: try an exact match even if the backend returns no result

https://gerrit.wikimedia.org/r/276703

@EBernhardson got it, I'll try to create these suggestions, unfortunately fixing prefixsearch for other namespaces might be more complex :/

Change 276705 had a related patch set uploaded (by DCausse):
CompletionSuggester: stop ignoring ':'

https://gerrit.wikimedia.org/r/276705

Change 276703 merged by jenkins-bot:
completionSearch: try an exact match even if the backend returns no result

https://gerrit.wikimedia.org/r/276703

Change 276705 merged by jenkins-bot:
CompletionSuggester: add support for crossnamespace redirects

https://gerrit.wikimedia.org/r/276705

Deskana closed this task as Resolved.Mar 15 2016, 2:39 AM

We think this problem should be solved by the above patches, so resolving this task. This should roll out this week, so we'll get quick feedback from users whether it's working or not.