Page MenuHomePhabricator

Decrease the weight of page views in search result ranking for the completion suggester
Closed, ResolvedPublic

Description

Given David's recent email, we should probably decrease the weight with which pageviews affect our search result rankings to reduce volatility in our search result rankings. This task tracks the discussion and any work for that.

Event Timeline

Deskana raised the priority of this task from to Needs Triage.
Deskana updated the task description. (Show Details)
Deskana added a project: Discovery.
Deskana added a subscriber: Deskana.
Deskana triaged this task as Medium priority.Jan 25 2016, 10:42 PM
Deskana moved this task from Needs triage to Search on the Discovery board.

It is not staff or developers roles to play policymaker on offensive terms, that is clearly a community role; that said pageviews as a raw statistic should not have a high determination factor of results based on leading two characters.

One presumes (with no data to rely upon) that pageview statistics based on a two letter start to a word is going to be a poor proportional representation of the sum of all different landing pages with those two starting letters. It is completely reasonable to not overly skew search results based on pageviews as numbers, (be it for an offensive term or not.) Better indicators would seem to be page views based on completed searches (and to be agnostic on offensive or whatever); or to based it on a factor of pageviews as a proportion of pageviews for letter combinations, eg. if 50% the pages viewed for a two letter combination is it reasonable to skew, however, it if is 1.01% vs 1.17% then it is not reasonable.

We have to be language agnostic so cannot be tied to thinking that English Wikipedia is the ultimate guide for pages views to lead search results, and how we manage offensiveness across langauge. We also do not want to make a rod for our back in that we have to manipulate or be beholden to complaints about this term being offensive, biased, etc. and then forever tweaking formulae, on different wikis, and for this to be a staff/developer responsibility (unsustainable resources and politically). If that is something that needs to be done, it would be better to let the communities beat themselves silly in controlling such lists of words that they wish to regard as offensive or otherwise problematic, not staff nor developers.

We also need to be cognisant that this will be a suggester for all WMF wikis, not just the encyclopaedias, and the impact that decisions made can skew other results. So what impact is the decision to skew a result based on an offensive result at enWP, going to have at Commons, or at Meta, etc.

It is not staff or developers roles to play policymaker on offensive terms, that is clearly a community role; that said pageviews as a raw statistic should not have a high determination factor of results based on leading two characters.

I agree with this principle. That said, there was a suboptimal behaviour with the prototype here, in that it was giving too much weight to page views; the fact that the results were also offensive was a (hilarious) coincidence. ;-)

Deskana claimed this task.

The crux of this issue is resolved now. We fairly significantly decreased the weight with which page views affect the search ranking. This solves the volatility problem, and the ranking issues. We can resolve this task.

@Deskana. Nice to know. As I mentioned in the mailing list, I was wondering on the usefulness of the order of magnitude of a pageview raw count though maybe that becomes too arcane a measure, though it too would seem to allow some reasonable measure of views, and remove the volatility.