Page MenuHomePhabricator

Improve completion suggester score function (by using page views to affect rankings)
Closed, ResolvedPublic

Description

1. Very high scores
Some pages have very high scores because :

  • they are massively linked in footers or headers:
    • Wikipedia
    • IP Address
    • Tilde
    • Copyright infringement
  • Digimon: unknown reason, will have to investigate.
  • List pages : large pages that link to each others
  • Some dates : incoming_link

This is mostly due to the initial scoring function, we expect that the addition of pageviews statistics in the score formula will help to mitigate these problems.

2. Scoring weirdness
white house will suggest white house farm murders first and not white house, again this is mostly due to our initial scoring algorithm. White house farm murders is a long page (longer than white house), is flagged as a quality page and has a lot of external links.
We hope that pageviews statistics will help to display more "obvious results".

Event Timeline

dcausse created this task.Sep 1 2015, 10:41 PM
dcausse raised the priority of this task from to Needs Triage.
dcausse updated the task description. (Show Details)
dcausse added a subscriber: dcausse.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 1 2015, 10:41 PM
dcausse renamed this task from Improve completion suggestion score function to Improve completion suggester score function.Sep 1 2015, 10:45 PM
dcausse set Security to None.

Pageviews data should be available with T44259

Deskana added a subscriber: Deskana.Dec 4 2015, 5:24 AM

This doesn't seem to be a blocker for T119535: EPIC: Create a Beta Feature for the new completion suggester so that users can give us feedback on it, but I'm guessing we will want to work on it after the beta feature is launched. @dcausse will know best.

Restricted Application added a subscriber: StudiesWorld. · View Herald TranscriptDec 4 2015, 5:24 AM
Deskana moved this task from Needs triage to Search on the Discovery board.Dec 4 2015, 5:24 AM
Deskana triaged this task as Normal priority.Dec 17 2015, 5:45 PM
dcausse claimed this task.Jan 28 2016, 2:34 PM

Sorry, I wanted to close this one and merge it to T120796...
Adding this one to the sprint to reflect the work on T120796 .

Change 265771 had a related patch set uploaded (by EBernhardson):
Integrate page views into completion score

https://gerrit.wikimedia.org/r/265771

One issue someone brought up yesterday, searching for 'obama' on http://en-suggesty.wmflabs.org/suggest.html provides, subjectively, much better results for prefix search than the completion suggester. Not sure what in particular to do about that.

Deskana renamed this task from Improve completion suggester score function to Improve completion suggester score function (by using page views to affect rankings).Jan 28 2016, 5:42 PM

Sorry, I wanted to close this one and merge it to T120796...
Adding this one to the sprint to reflect the work on T120796 .

It's all good! I tweaked the title of this one slightly to reflect the change and ongoing work.

Concerning Obama: "oboma" should be the top result thanks to the exact db match in prod. Concerning other suggestions I'll have a look.

Change 265771 merged by jenkins-bot:
Integrate page views into completion score

https://gerrit.wikimedia.org/r/265771

Deskana closed this task as Resolved.Feb 3 2016, 6:20 PM