Page MenuHomePhabricator

Add a link: backtesting protocol
Closed, ResolvedPublic


As we expand the link recommendation algorithm to more wikis, we won't always have volunteers to validate the results by hand. Therefore, we need a way to detect whether the algorithm is working well on an arbitrary language. We will develop a protocol through which we can compare the algorithm's results to existing links on articles, to see to what extent they overlap.

In terms of wikis to prioritize, we are planning on working in these Wikipedias first:

  • kowiki (Korean)
  • cswiki (Czech)
  • arwiki (Arabic)
  • viwiki (Vietnamese)
  • frwiki (French)
  • ukwiki (Ukrainian)
  • srwiki (Serbian)
  • hywiki (Armenian)
  • huwiki (Hungarian)
  • euwiki (Basque)
  • plwiki (Polish)
  • fawiki (Persian)
  • itwiki (Italian)
  • ptwiki (Portuguese)
  • hewiki (Hebrew)
  • svwiki (Swedish)
  • dawiki (Danish)


Due Date
Aug 31 2020, 7:00 AM

Event Timeline

Update: testing datasets have been created for a few languages. It's now time to upload those datasets and we'll ask community members to evaluate them to make sure they are good sets for testing. If they are good, then we will assume that the method for creating such datasets will work in other languages, without needing to be reviewed.

Next: run the actual backtesting using the latest version of the algorithm, which contains some improvements. We'll start with English here.

Goal: have this all done and finished by August 31.

Update: @DED is currently working on the backtesting. will follow up this week if we have this finished.

Restricted Application added a subscriber: Huji. · View Herald TranscriptSep 9 2020, 6:16 PM

Update: implemented the first version of the backtesting protocol.

  • We evalaute the link recommendation on individual sentences (this was the first sentence in an article that had at least one link).
  • We remove all existing (true) links from the sentence. We run the model to get link recommendations for that sentence. We compare whether the recommended links match the true existing links ( both, the anchor-text and the link-target-page have to match)
    • precision: how many of the recommended links are true links
    • recall: how many of the true links were recommended
  • we can vary the threshold for recommending a link
    • low threshold = low precision, high recall
    • high threshold = higher precision, lower recall
  • trained the model and ran backtesting for 7 wikis (simple, de, pt, ar, cs, ko, vi) without language-specific fine-tuning
    • with recall of ~40% we can ensure to have a precision of at least 70-80% for any language
    • for some languages such as vi, pt, precision is even higher
  • results (and some more details):

@MMiller_WMF / @DED: Hi, the Due Date set for this open task is more than two months ago.
Can you please either update or reset the Due Date (by clicking Edit Task), or set the status of this task to resolved in case this task is done? Thanks.

I'm resolving this, if we're making changes later we can create a new task or reopen this one.