Page MenuHomePhabricator

Searching 'everything' is overly skewed towards article namespace
Open, LowPublic

Description

It used to be, I could search something like 'do not bite', then hit the "Everything" tab, and find "Wikipedia:Please do not bite the newcomers".

Over the last few months however, such Wikipedia namespace results don't even reach the first page of search results. It seems that content is boosted so aggressively, that I can no longer find 'project' content, unless I go to "Advanced" and specifically exclude content namespaces. The alternative is to have Wikipedia: or WP: in the search field, but it's just less than ideal.

We desperately need a way to easily search 'Editor' content efficiently. That can be a change to the algorithm's, or a change the UI to exclude content namespace from one of the presets. But this focus on results for 'readers' has become a nuisance for my personal search experience atm.

Event Timeline

TheDJ created this task.Mar 21 2017, 1:12 PM
Restricted Application added a project: Discovery. · View Herald TranscriptMar 21 2017, 1:12 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

I agree with you, it's very similar to what was requested here T155142, it's unclear to me why such low boost values have been set to non content namespaces. I'd be tempted to change the default weights and set them to at least 0.9. They are currently set to:

$wgCirrusSearchNamespaceWeights = [
        NS_USER => 0.05,
        NS_PROJECT => 0.1,
        NS_MEDIAWIKI => 0.05,
        NS_TEMPLATE => 0.005,
        NS_HELP => 0.1,
];

// Default weight of non-talks namespaces
$wgCirrusSearchDefaultNamespaceWeight = 0.2;

// Default weight of a talk namespace relative to its corresponding non-talk namespace.
$wgCirrusSearchTalkNamespaceWeight = 0.25;

I'd be in favor of changing the defaults and adapt them on a case by case basis when users consider that a namespace is too noisy and needs to be aggressively discounted.

(The main problem I think is that we've lost track of the reason why such low weights were set like that in the first place)

Deskana triaged this task as Low priority.Mar 23 2017, 9:12 PM
Deskana moved this task from needs triage to later on... on the Discovery-Search board.
EBernhardson added a subscriber: EBernhardson.EditedApr 6 2017, 5:18 PM

I'll just add an agreement here. If users are searching 'Everything', then they either know what they want isn't in content, or have looked in the content namespaces. Pushing content up so highly in the everything search makes it less useful. The merged ticket above is an example where searching everything should have brought the related page with a full title match to the top, but because content is weighted so highly it doesn't show up at the top of the results.