Page MenuHomePhabricator

Add option to rank by number of languages the article exists in
Closed, ResolvedPublic

Description

For the Translation recommendation type, when the user does not provide a seed, we currently rank on pageviews in the source. Let's add an option to rank by the number of different languages the article exists in. Randomly shuffle tied articles on each request.

Event Timeline

I'm against using randomness as a tie-breaker when we could just use pageviews to break ties - thoughts?

Proposed implementation:

  1. Get most popular articles for last N days
  2. Sort based on number of wikis articles are present in, falling back to some tiebreaker

Change 337626 had a related patch set uploaded (by Nschaaf):
Add option to rank translation recs by sitelinks

https://gerrit.wikimedia.org/r/337626

Change 337626 merged by jenkins-bot:
Add option to rank translation recs by sitelinks

https://gerrit.wikimedia.org/r/337626

schana removed a project: Patch-For-Review.

This has been completed and can be used by adding a rank_method=sitelinks parameter to either the api or tool endpoints.

@schana the results look more useful with rank_method=sitelinks for a few other examples I tried from en to fa.

@Pginer-WMF we're testing with showing relevant articles not w.r.t. their pageviews but the number of Wikipedias that have that article. Check https://recommend.wmflabs.org/?s=en&t=de&seed=Agriculture&search=related_articles&rank_method=sitelinks for example. My question for you is: do you think we should change the "incentive" on each card in GapFinder from xxk recent views (where xx is the number of pageviews) to a different metric based on the rank_method we use. For example, should we change it to "xx Wikipedia's have this article" in this case?

@Pginer-WMF we're testing with showing relevant articles not w.r.t. their pageviews but the number of Wikipedias that have that article. Check https://recommend.wmflabs.org/?s=en&t=de&seed=Agriculture&search=related_articles&rank_method=sitelinks for example. My question for you is: do you think we should change the "incentive" on each card in GapFinder from xxk recent views (where xx is the number of pageviews) to a different metric based on the rank_method we use. For example, should we change it to "xx Wikipedia's have this article" in this case?

The motivation information does not need to be the same as the information used for selection, but both alternatives could work well. Some considerations:

  • Regardless of the criteria for selecting articles, if we think the number of views is the best motivator we can keep it. In this case, we may want to show it only for articles that cross a certain threshold of views. In this way it could be used as an additional criteria in order to select the articles with most impact (in terms of views) from those that are more relevant (based on the presence in different wikis).
  • Showing the number of Wikipedias (or languages) that already have the article helps as both a motivator and an explanation on why the articles are suggested.

It would be great to research more on what motivates people more. From my interaction with translators, I think that both pieces of information would be understood as how "popular" the article is, so I don't expect much difference. It may be good to make the different changes in separate steps in order to measure the impact of the algorithm change first and the motivator change later.

@leila Do you want to change the default rank_method to sitelinks? Right now its only exposure is by manually adding the parameter.

@schana will this have impact on what CX is surfacing in Suggestions? If yes, let's wait a couple of more weeks to see if we can observe the result of the readmore change.

@Pginer-WMF Do you suggest that we give the user of recommend.wmflabs.org an option for switching between different ranking methods? This can be desirable for some editors as they are sometimes they are looking for high pageview articles, and sometimes for those that are more likely to be present in other Wikipedias. If we should give them such an option, what is your design recommendation?

@leila the rank_method parameter has to be specified for the ranking to be altered.