Page MenuHomePhabricator

Search suggests non-existent title due to namespace/redirect mixup
Closed, ResolvedPublic

Description

On Meta-Wiki, enter "Help:Glo" in the search box. You get "Help:Global account" as a suggested result, but that page doesn't actually exist, nor has it been created-and-deleted in the past.

@TJones investigated the issue a bit; below is the summary of his findings:

<Trey314159> There's a redirect from "Global Account" to "Help:Unified_login". I would hypothesize that those get mixed up somehow (as in that's what I'd look at first to try to explain it), but I don't know.

<Trey314159> I have been able to reproduce the problem on enwiki, too. Searching for "Help:how to" suggests "Help:How to edit", but no such page exists. "How to edit" is a redirect to "Help:Editing".

<Trey314159> I had a chance to look further, and the problem you found is not limited to Help (as expected). "Wikipedia:Second account" has a similar set up ("second account" is a redirect to "Wikipedia:Sock puppetry", but "Wikipedia:Second account" doesn't exist).

Related Objects

Event Timeline

gpaumier raised the priority of this task from to Needs Triage.
gpaumier updated the task description. (Show Details)
gpaumier added subscribers: gpaumier, TJones.

I have a good news and a bad news.

The good news is: this bug exists since the beginning of Cirrus and it's not something that was broken recently.
The bad news is: it's a bug due to Cirrus core design.

Cirrus document model is (example from Help:Unified_login ):

{
  "_id": "65317",
  "title": "Unified login",
  "namespace": 12,
  "redirect": [
    {
      "title": "Global account",
      "namespace": 0
    },
    [...snipped...]
  ]
}

So only target pages are indexed, redirect links are not indexed as separate documents (which is IMHO a very good idea, it would have been extremely complex to do otherwise).
Here Unified login is a page in the namespace Help but has a redirect Global account in namespace 0.
When you use prefix search in the top right box you'll see suggestions from namespace 0 by default but as soon as you enter a namespace prefix like Help: the prefix search will filter documents according to that namespace (here 12).

Cirrus will find Unified login because it cannot exclude the inner redirect Global account. This is what I can read in cirrus code ( ResultsType.php:175 ) :

// Instead of getting the redirect's real namespace we're going to just use the namespace
// of the title.  This is not great but OK given that we can't find cross namespace
// redirects properly any way.
$redirectTitle = Title::makeTitle( $r->namespace, $redirectTitle );
$resultForTitle[ 'redirectMatches' ][] = $redirectTitle;

This design is the cause of 2 bugs :

  • Showing suggestions that are not in the namespace requested and link to inexistant pages (this bug)
  • Do not show suggestions issued from redirects in the current namespace to a page in another namespace:
    • When I search for Global acc in the namespace 0 Global account is not suggested, I have to type the full title name.
    • When I search for global account in special search (on content page only) the redirect is not found. (Is this really a bug?)

I'm wondering why we index redirects with mixed namespace if we are not able to handle them properly.

Thanks, David.

Ugh. I've come up with like five different semi-easy approaches for this, and they are all just too kludgey, and none worth repeating.

Is there any realistic way of automatically generating a working redirect? And is there any way of getting an idea of how many of these cross-namespace redirects there are? (If we need to add 50 auto-generated redirects, no worries! If it's 500K, we may have to think about it a lot harder.)

ksmith moved this task from Analysis to UX on the Discovery-ARCHIVED board.
ksmith moved this task from UX to On Sprint Board on the Discovery-ARCHIVED board.

And is there any way of getting an idea of how many of these cross-namespace redirects there are?

Some large wikis have a tracking category for cross-namespace redirects. According to https://en.wikipedia.org/wiki/Category:Cross-namespace_redirects , there are a few thousand on the English Wikipedia alone. I imagine that that's the most there is on any single wiki.

Deskana subscribed.

Unfortunately, this is somewhat of an edge-case that few users will experience, and since it's looking quite tricky to fix, it cannot be prioritised right now.

debt lowered the priority of this task from Low to Lowest.Oct 26 2017, 5:30 PM

This is a bug in the Cirrus core design and "should" be fixable but appears to be quite challenging and kludgey and very edge case scenario.

This is not an edge case scenario at all, I run into it all the time on certain wikis (last time it happened to me was yesterday) where a non-main namespace is used fairly interchangeably with the non-main namespace (e.g. Manual: on mediawiki.org or Help: on meta).

We partially fixed this in the completion suggester in T129575. Sadly the completion suggester was recently disabled on some wikis including mw.org and meta (T178474).

Just happened to me again, and found it so meh I felt like updating this task: on Meta, type "research index" in the search box. The only thing it suggests is https://meta.wikimedia.org/wiki/Research:Research_Index, which doesn't exist.
Only if you click on the magnifying glass, you get sent to Research:Index .

Change 398124 had a related patch set uploaded (by EBernhardson; owner: EBernhardson):
[mediawiki/extensions/CirrusSearch@master] Resolve redirect namespaces from source docs in fancy title results type

https://gerrit.wikimedia.org/r/398124

Change 398192 had a related patch set uploaded (by EBernhardson; owner: EBernhardson):
[mediawiki/core@master] Silently drop unknown titles in completion search

https://gerrit.wikimedia.org/r/398192

The two attached patches are not complete solutions, that would still require a rethinking of how we store redirects, but it should at least paper over the problem from the users perspective.

The new code in core drops titles returned from search that do not exist. This is the same behavior as full text search. The fact that things are being dropped is recorded into statsd so we have insight into how much this is happening and if it increases unexpectedly.

The new code in cirrus tries a bit harder to find the right namespace for redirects by digging through the source document from cirrus to find which redirects match the highlighted titles. This should resolve most of our cases.

Change 398230 had a related patch set uploaded (by DCausse; owner: DCausse):
[mediawiki/extensions/CirrusSearch@master] [WIP] Fetch redirect namespace during prefix search

https://gerrit.wikimedia.org/r/398230

Change 398192 merged by jenkins-bot:
[mediawiki/core@master] Silently drop unknown titles in completion search

https://gerrit.wikimedia.org/r/398192

Change 398124 merged by jenkins-bot:
[mediawiki/extensions/CirrusSearch@master] Resolve redirect namespaces from source docs in fancy title results type

https://gerrit.wikimedia.org/r/398124

So it's perhaps a touch awkward now, but the overall result is correct. Typing Help:Glo into autocomplete on metawiki suggests 'Global Account' from the main namespace, and selecting it takes you to Help:Unified login.