Page MenuHomePhabricator

Scaling of link suggestions service
Closed, ResolvedPublic

Description

Request Status: New Request
Request Type: project support request
Related OKRs:

Request Title: Scaling of link suggestions service

  • Request Description: In May 2021, the Growth team deployed the "add a link" structured task, which uses a machine learning service to suggest words and phrases that should be turned into wikilinks on Wikipedia articles. In the year since its deployment, it has proven to be a popular feature with newcomers that statistically increases their likelihood to make constructive edits, and it is accepted by communities. The machine learning was created by the Research team, and the service was built by the Growth team. The machine learning models have since been taken over by the Machine Learning team, but the storing, loading, and management of link suggestions remains with the Growth team. We want the Platform teams to take over this service for two reasons:
    • Coverage. The model takes seconds to score each article. This means that we don't score all articles in a wiki; rather we score just enough to supply the suggested edits feed on the newcomer homepage. If we were able to score all articles in a wiki, we would be able to offer the feature in a totally different way: we could push edit suggestions to users as they are reading articles, which would result in many more new editors working at higher volume.
    • Availability/maintenance. The Growth team is not well-suited to maintain this service or expose it for other builders to use. We would like the service to be available to more people, and for us to recoup our bandwidth to work on other priorities.
  • Indicate Priority Level: High
  • Main Requestors: Growth and Editing teams
  • Ideal Delivery Date: November 2022
  • Stakeholders: Marshall Miller, Kosta Harlan, Peter Pelberg

Request Documentation

Document TypeRequired?Document/Link
Related PHAB TicketsYesT266437: Add a link engineering: backend product specifications T307902: Assess database requirements for link recommendations reading entry point
Product One PagerYesforthcoming
Product Requirements Document (PRD)Yes<add link here>
Product RoadmapNo<add link here>
Product Planning/Business CaseNo<add link here>
Product BriefNo<add link here>
Other LinksNofeature description; machine learning description; service description

Event Timeline

@MMiller_WMF: Would you know who this task should be assigned to? The current assignee has been inactive for six months.

Letting @MMiller_WMF or @KStoller-WMF make the call, but I think this is effectively being worked on by the research team. The first step is T388258: Make airflow-dag for addalink training pipeline output compatible with deployed model to turn the manual training process into an automated Airflow DAG. In future work, there is the idea to drop the service completely or maybe least replace it by a database lookup.

Also, note that "If we were able to score all articles in a wiki, we would be able to offer the feature in a totally different way: we could push edit suggestions to users as they are reading articles, which would result in many more new editors working at higher volume." is being done right now in the context of T386250: Rewrite refreshLinkRecommendations to not iterate through article topics and the Surfacing Structured Tasks experiment. (We did not solve the "The model takes seconds to score each article."-issue fundamentally, but instead just turned the dial to 100% anyway ¯\_(ツ)_/¯.)

KStoller-WMF claimed this task.

@SSalgaonkar-WMF and I have been discussing this, and further work is planned on improving and scaling link suggestions, but it will be documented in a Machine Learning task associated with T388258: Make airflow-dag for addalink training pipeline output compatible with deployed model.
Since this Epic is associated with a very old request that hasn't been active in some time, I would suggest we mark it resolved. (Subscribers are welcome to re-open if you disagree).