Page MenuHomePhabricator

Add a link: algorithm improvements
Open, Needs TriagePublic

Description

In T245330, we tested out the link recommendation algorithm and decided that it has enough potential to continue with. This task is about issues that were discovered in testing it on T245330, and potential improvements to make. For each of the following issues, we may decide not to try to improve it, if this would cause us problems with precision, recall, or scalability:

  • Not linking in the middle of a multi-word phrase, e.g. only linking “London” in “London Underground” (examples)
  • Linking to the correct disambiguation of a word, e.g. from an article about biology, linking to “Cell (biology)” instead of to “Cell (geometry)”.
  • Not suggesting a link if that word or phrase has already been linked in the article.
  • Not linking inside the titles of external links (example article)
  • Not linking inside the names of organizations, e.g. “Access & Publishing Group”. (example article)
  • Not linking common names of people (example article)
  • Not linking to common phrases, e.g. “Yet another” (example article)
  • Not linking to article about dates, like the article for 1979.

Event Timeline

MMiller_WMF edited projects, added Growth-Team (Current Sprint); removed Growth-Team.

@DED -- this task is ready for you to work on. I think for each of the bullet points in the task description, it would be good if you leave comments covering:

  • Whether you think it is wise to make the change, and what it might mean for precision and recall. Perhaps a fix would cause more problems than it solves.
  • Whether it is a fix that would need to be language-specific or could scale with the algorithm across languages.
  • Whether you ended up making changes based on the issue.

If you have questions about the issues, or need clearer examples, please let me know. As of now, I've linked to articles on Test Wiki that contain examples of the issue, which you can find by looking at the links marked with the "X", but I can point them out more specifically if you need.

How does this sound?

@DED -- as we continue to work on the algorithm, a community member from Ukrainian Wikipedia (@NickK) requested that we try it in his language and see how it performs. Maybe that can be the next language we try.