Page MenuHomePhabricator

Evaluate the default rescore functions with incomingLinks
Closed, DuplicatePublic

Description

The default rescore window uses mainly the number of incoming_link. Incoming link is certainly a very good param but it should be correctly adjusted according wiki size.
We use log(incomingLinks+2), we should review this formula and make sure that it plays nicely with the lucene score.
I tend to think that we should normalize this value against wiki size and maybe use a min/max value to make sure that its impact is not too high.

Example query: https://en.wikipedia.org/w/index.php?title=Special%3ASearch&profile=default&search=chirac&fulltext=Search
Chirac, Lozère is ranked #2 and it's a very small village in lozere france.
Places (even small ones) tend to have a very high number of incoming links. It's because places includes links (see hidden section "Communes of the Lozère department" at the bottom) to other places in the same district.

By disabling boostLinks results are slightly better : https://en.wikipedia.org/w/index.php?title=Special%3ASearch&profile=default&search=chirac&fulltext=Search&cirrusBoostLinks=no

Event Timeline

dcausse raised the priority of this task from to Needs Triage.
dcausse updated the task description. (Show Details)
dcausse added a project: CirrusSearch.
dcausse subscribed.
Restricted Application added subscribers: StudiesWorld, Aklapper. · View Herald Transcript
Deskana set Security to None.
Deskana subscribed.

Pulling this into the sprint as it relates to T125603, which is a goal this quarter.

@dcausse, is all the config set up to run this? If so, I can run 1K enwiki dewiki, and frwiki tests.

@TJones no sorry, I over-tuned en-suggesty with BM25, field weights, pageviews & co :)
You could maybe run a set for fun and see what the impact of BM25 defaults would be, but remove --explain from the runner, explain is totally different with BM25 and my code is too fragile to support it :)

@dcausse, no worries. I wasn't sure if it was ready to be tested and I wanted to make sure you weren't waiting on me.