Page MenuHomePhabricator

Improve link recommendation model for orphan articles
Closed, ResolvedPublic

Description

We recently finished a paper describing the problem of orphan articles as the dark matter in Wikipedia. In that work, we sketched a potential solution to support editors in de-orphanization using link translation (Wiki-Visibility tool).

Once we identified a suitable new link to add (i.e. consisting of a source article and a target article), we are still faced with the task of inserting the link somewhere into the text of the source article. This can be a non-trivial task if an anchor word matching the pagetitle of the target article is not available and/or if the source article contains a lot of text.

Therefore, in this task we develop a multilingual model to support the task of link insertion by identifying the most suitable text span for a specific new link.

Previous focus: In this task, the aim is to quantitatively evaluate how good these recommendations are. The idea is to propose a better model than SOTA benchmarks, but also to describe how currently available models do not perform well in the regime that is most relevant to support editors (e.g. orphans).

Event Timeline

weekly update:

  • no update (this was a shortened week and I was busy with catching up after the break and to preparing for the team offsite next week)

weekly updates:

  • no updates this week

weekly updates

  • no updates this week
  • planning to pick this up again in the next 1-2 weeks

weekly update:

  • met with Akhil to discuss progress on related project to develop a model for suggesting the best position of where to add a specific link in the text of an article
  • with Tomás, a master student, they built a first version to suggest relevant text spans for specific links to be added and beating all other baselines by a wide margin
  • this model would be a nice addition to the available[[ https://linkrec.toolforge.org/ | link recommendation to increase visibility of orphan articles ]]. the latter model surfaces link candidate articles from where to link to an orphan article (in order to de-orphanize it). however, currently it does not provide any information where in the text the link could be added. therefore, the new model would provide additional support to editors to de-orphanize orphans.

weekly updates:

  • some delay in starting the work, we discussed timeline starting in January

weekly update:

  • during the past weeks, the focus of how to best improve the model has shifted
  • instead of optimizing which link to recommend to an orphan, we now systematically approached where to insert recommended links in the text. this would be even more useful to editors who would want to de-orphanize orphan articles, as the current model only recommends which link but not the position in the text. thus, the existing model would become even more actionable.
  • together with collaborators lead by Akhil, we have now completed the analysis of a multilingual model and shown that our model can identify suitable positions in the text for specific link targets (such as to orphans) beating all other baselines. we are currently preparing a paper for submission to a conference (deadline Feb 15).

weekly update:

  • working on the paper to submit by Feb 15 methods/results with tables/figures are complete the text still needs work, especially in the intro/discussion, but the main lines of thought and talking points are sketched out and wont change **the paper is still too long. we are working on shortening by reomving/compressing (but nothing substantial will be added)
  • shared a draft with Leila for review https://phabricator.wikimedia.org/T356504

weekly update:

  • working on the paper with collaborators to finalize it for the submission deadline next week (Feb 15)

weekly update:

  • didnt manage to fully finish the paper due to some unexpected commitments for the lead author
  • instead, we will aim for the next deadline in the cycle of rolling reviews (April 15) https://aclrollingreview.org/ this will give us some extra time to polish the writing in the next weeks.

weekly update:

  • no update this week

weekly update:

  • no updates this week

weekly update:

weekly updates

  • work on the model is finished
  • adding results to the meta-page
  • also working towards finishing the paper for submission by April 15 T356504

weekly update:

  • starting sprint to finish corresponding paper with results T356504

weekly update: