In {T252822}, we (#Growth-Team) are working on a project to guide new users in how to add links in Wikipedia articles.
Very high level summary (https://wikitech.wikimedia.org/wiki/Add_Link is the canonical source):
- Research has a codebase which trains the AI model on production Stats machines
- Research has a simple API that runs in a container via the Deployment Pipeline that accepts page title and wiki language and responds with wikitext containing link recommendations
- GrowthExperiments extension will call the API via a maintenance script on cron, and cache output in a MySQL table
- GrowthExperiments will generate an event which Search team will consume and they will update the ElasticSearch index for a document to indicate if the article has link recommendations
## Open questions
- How do we transfer the trained model data (files, but could also imagine using Cassandra key value store) from the Stats machines to somewhere that the container can make use of them?
- How much RAM is used per request in the mwaddlink-api application?
## Miscellaneous
- For our initial release we want to have a pool of several thousand articles that have link recommendations. That will mean processing perhaps tens of thousands of articles per wiki, as not every article will yield (good) link recommendations. (More details are in the [project architecture document](https://docs.google.com/document/d/1Y0Jt2N20e7-H83MMAqVYcSB-UIGba1YoSQE39z2dlds/edit#))
### Further reading
- https://wikitech.wikimedia.org/wiki/Add_Link
- [mwaddlink](https://github.com/dedcode/mwaddlink) that #research (specifically @DED and @MGerlach) are working on; it's a python application with some machine learning libraries to train the model. See also https://github.com/martingerlach/mwaddlink-api for the application which handles requests and returns a response.
- [proposal for the project architecture](https://docs.google.com/document/d/1Y0Jt2N20e7-H83MMAqVYcSB-UIGba1YoSQE39z2dlds/edit?usp=sharing), and then there is also a [longer document with some notes exploring various options and questions.](https://docs.google.com/document/d/187LPs2c5j13O8dlemwsWMEkn4__LgaN7TcXwEkakxYY/edit?usp=sharing)