Page MenuHomePhabricator

The top 1M ranked link recommendation for wp
Closed, ResolvedPublic

Description

Give a ranked list of top 1M link recommendation to the Reading team. They're considering whether link recommendations are a good use-case for micro-task contributions on mobile and them being able to review the list can help them make a decision.

Event Timeline

leila moved this task from Backlog to In Progress on the Research board.

@JKatzWMF email on the way with the top 100K. We had this ready so let's start from here, and if you want to see the top 1M, we can produce that, too.

Thanks, I believe this was sufficient!

I ran some very rough analysis on the top 100k recommendations with the goal of seeing if adding these links would be something that a casual user could do and here is what I found.

Of the top 20, 10 in the middle and 10 at the end, all of the links had either been added by March, by today, or there was no text that a user could use to make the link. I looked loosely too. For instance, if the link was for Bob Dylan, I looked for Bob Dylan, Dylan, Robert Zimmerman, etc. I also looked in the navigation boxes at the bottom of each article. Adding a link would require an artificial rewriting of a portion of the article to include it. Of the 50 I looked at, there was not a single 'link' that could be added by converting text to a hyperlink.

My computer crashed over the weekend and didn't save the results (arrrrrgghh), so I don't have the exact breakdowns of how the ineligible links broke down, but the essence was: 65% not in the article, 30% organically added by march, 5% organically added since March.

My conclusion is that users would not be able to meaningfully contribute to Wikipedia using this dataset at enough scale to justify a feature.

@JKatzWMF did you consider doing some processing on the recommendations as well? These are the top 100K recommendations, independent of whether the anchor text exists in the source article or not. You can focus on a subset of it where the anchor text does exist in the text.

We could limit the subset where the anchor text exists, but that reduces the number of recommendations significantly. In all of the ones I looked at where text was present, the link had been added already, so I am not optimistic that there would be a large number that would come out of this.

This data is old (early 2015), I'm not surprised if links are added in the mean time. If you decide to go with such a feature, we should generate fresh data more continuously. If your decision relies on seeing the fresh data, we should consider creating a new batch for you.

@leila we got off into email land for a bit, so I'll post this for the record.

I'm happy to take another go at it. What I am trying to evaluate is if the following things are true. All of them need to be true for this to be a reasonable micro-contribution feature:

  • can a casual user make the call (are the rules obvious enough?)
  • is the answer "yes" too often (>80-90%) to be interesting?
  • does this add value to the project (is it work that either won't get done or pulls from other important work)
  • is there scale here--are there more than 500k tasks/month that could be done?

From my preliminary glance, #3 and #4 got in the way of the first 2 because I couldn't actually find any valid "tasks". I was thinking this was an indication that #3 or #4 were blockers, but if you believe they are not, I'd be happy to take a second look.

@JKatzWMF Bob and I talked about this. We are going to generate a new dataset for you to try, filtering for those recommendations that do have an anchor text. What is the timeline here?

No rush from my end. Be end of the quarter?