Page MenuHomePhabricator

Searching 'everything' is overly skewed towards article namespace
Open, LowPublic

Description

It used to be, I could search something like 'do not bite', then hit the "Everything" tab, and find "Wikipedia:Please do not bite the newcomers".

Over the last few months however, such Wikipedia namespace results don't even reach the first page of search results. It seems that content is boosted so aggressively, that I can no longer find 'project' content, unless I go to "Advanced" and specifically exclude content namespaces. The alternative is to have Wikipedia: or WP: in the search field, but it's just less than ideal.

We desperately need a way to easily search 'Editor' content efficiently. That can be a change to the algorithm's, or a change the UI to exclude content namespace from one of the presets. But this focus on results for 'readers' has become a nuisance for my personal search experience atm.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

I agree with you, it's very similar to what was requested here T155142, it's unclear to me why such low boost values have been set to non content namespaces. I'd be tempted to change the default weights and set them to at least 0.9. They are currently set to:

$wgCirrusSearchNamespaceWeights = [
        NS_USER => 0.05,
        NS_PROJECT => 0.1,
        NS_MEDIAWIKI => 0.05,
        NS_TEMPLATE => 0.005,
        NS_HELP => 0.1,
];

// Default weight of non-talks namespaces
$wgCirrusSearchDefaultNamespaceWeight = 0.2;

// Default weight of a talk namespace relative to its corresponding non-talk namespace.
$wgCirrusSearchTalkNamespaceWeight = 0.25;

I'd be in favor of changing the defaults and adapt them on a case by case basis when users consider that a namespace is too noisy and needs to be aggressively discounted.

(The main problem I think is that we've lost track of the reason why such low weights were set like that in the first place)

Deskana moved this task from needs triage to search-icebox on the Discovery-Search board.

I'll just add an agreement here. If users are searching 'Everything', then they either know what they want isn't in content, or have looked in the content namespaces. Pushing content up so highly in the everything search makes it less useful. The merged ticket above is an example where searching everything should have brought the related page with a full title match to the top, but because content is weighted so highly it doesn't show up at the top of the results.

MPhamWMF subscribed.

Closing out low/est priority tasks over 6 months old with no activity within last 6 months in order to clean out the backlog of tickets we will not be addressing in the near term. Please feel free to reopen if you think a ticket is important, but bare in mind that given current priorities and resourcing, it is unlikely for the Search team to pick up these tasks for the indefinite future. We hope that the requested changes have either been addressed by or made irrelevant by work the team has done or is doing -- e.g. upgrading Elasticsearch to a newer version will solve various ES-related problems -- or will be subsumed by future work in a more generalized way.

RhinosF1 removed a project: Discovery-Search.
RhinosF1 subscribed.

Re-opening tasks and removing from team workboard per IRC feedback given yesterday and discussion with MPham.