Perform additional analysis of the quality of the results of the link recommendation model (such as number of recommendations/article per topic). The insights from these additional analysis should help us to get a better understanding of the model before it is deployed (e.g. which threshold-parameter to use to ensure we have enough recommendations while keeping high quality) and when a model trained in a new language is not working properly. In addition, we aim to start writing a paper with the problem, the solution, and results.
- Mentioned Here
- T284481: Deploy Add a link to the second set of wikis
T284666: Add a link: unnecessary articles on units are often suggested
T283715: Add a link in bnwiki: algorithm improvements: articles are not being suggested at their first appearance
T279411: Determine why service responses are slow and what we can do about it
T279427: Republish datasets with primary key ID column included
T279434: Add a link: algorithm improvements: Define filter for not linking specific articles types
T279519: Add a link: algorithm improvements: Avoid recommending links in sections that usually don't have links
T279521: Add a link: algorithm improvements: Improve parsing of text for generating anchor-text candidates
T276438: Establish processes for running the dataset pipeline
T278864: Add a link: evaluate link recommendation (Mar 30 2021)
T279037: Character encoding issues in MySQL anchor dictionaries for viwiki
P14882 Number of links vs ngram-length
T277342: Experiment with and/or document ngram parameters
T273664: run-pipeline.sh fails on larger wikis
Update week 2021-01-25:
- started to scope the paper with Djellel and identified potential venue for submission
- started additional analysis around feature-importance, the number of articles for which we can make recommendations in a wiki, better understanding which sub-component in the model is the bottleneck in the performance (entity-recognition=identifying which anchor to link, or entity-linking=given the anchor, which article to link to), evaluating performance with respect to the topic (newcomers select topics beforehand) and length (how useful for, e.g., stub-articles)
Update week 2021-02-01:
- most of the time this week I invested in writing the paper with Djellel for submission at KDD
- the aim is to meet the deadline of Feb 08
- at the moment it looks like we are on track; very confident that can submit on time.
Update week 2021-02-08:
- finished and submitted paper.
- started to debug problems of algorithm when running the training pipeline for larger wikis (enwiki/frwiki); initially not a high priority, but potentially important to demonstrate model to larger audience (English); progress tracked here: T273664
Update week 2021-02-15:
- spent some considerable time to debug the problem with the larger wikis as this was flagged as a priority from Marshall.
- I actually managed to find a solution. am currently testing but it looks like this should be solved by next week and the models will be available for larger wikis such as enwiki, frwiki.
@MGerlach thank you for your update. As we're waiting for the ML Platform's platform to be ready and for ML engineers to be able to support us for this type of work, please do consider requesting funds for an ML engineer contractor to work with you. (I understand that you yourself may be very close to the finish line on this particular effort, but something to consider especially given that Miriam may run into where you are 6 months down the road as well.)
Update week 2021-02-22:
- fixed the issue about running on larger wikis such that we can train model for all languages independent of size
- tested for enwiki with good results (backtesting evaluation yields similar results as other wikis); though it takes considerably longer to train
- submitted patch to gerrit
- there might be some other incremental improvements in the next weeks, such as how to deal with links to and from disambiguation pages
Update week 2021-03-15
- we had different discussions around which manual filters we should add to prevent certain links from being recommended (e.g. links to disambiguation pages), see the extended documentation, in order to accommodate different requests from volunteers who gave feedback. based on these discussions I started to add features about the entities of the links based on information in wikidata. For example, disambiguation pages (often) have as the instance_of-property the value Q4167410 (Wikimedia disambiguation page ); similary, one can identify links to dates or years, which were previously flagged as unwanted link recommendations. the plan is to add the value for the instance_of-property for all links which makes it easy to filter a set of pages that should not be linked to. While we would set a default list (e.g. containing disambiguation-pages, dates, etc), the list could be customized for each wiki according to the style-guidelines.
- in T277342 we are working to improve performance of the link-recommendation model (locally works well, but in the production environment there are some issues). one approach we tested was to reduce the number of database lookups by reducing the maximum length of the ngrams in the text that are considered as a link-anchor (P14882 suggests that >95% of links have anchors consisting of 5 or less tokens so we might not loose too much when imposing more restrictions here)
Update week 2021-03-22:
- added a hard-coded filter to avoid suggesting links to certain types of pages such as disambiguation pages.
- this was suggested in earlier stages during the manual evaluation by volunteers; I now had the capacity address some of these issues in a more general framework using wikidata
- for each article in the set of candidate-links, we check its corresponding wikidata-item and retrieve all wikidata-items that are listed under the instance-of property. This then allows us to remove links belonging to certain instances from the anchor-dictionary such that we can make sure that these will not be suggested by the link-recommendation. Currently, we remove links that are instances of these items:
- the list of instances to filter can be easily adapted in this framework (as long as it is encoded in wikidata); in principle, it could thus be also easily customized for each Wikipedia depending on respective style-guides
Update week 2021-03-29:
- started to discuss details of the processing pipeline for the model T276438, such as how often the model should be re-trained. consulted Fabian about how to best approach deal with some decisions, but there are still many unknowns in how to best set this up with the current infrastructure.
- the link-recommendation tool is available online for 7 languages (ar, bn, cs, vi, en, fr, simple); before deployment, there is another cycle of manual evaluation by volunteers T278864.
- going through some early feedback surfaced serious problems in the accuracy of the suggested links in viwki T278864#6961431. after spending some time debugging, I believe the poor performance is due to errors in character encoding in the mysql-database used in production T279037; in the backtesting evaluation (in which viwiki was one of the best-performing wikis) this error did not surface since we are using the locally-stored in-memory pickle-files which for which the encoding works without problem. thus, it seems we can find an easy fix for this issue (hopefully).
- an interesting observation surfaced about viwiki: upon aggregating a set of articles for evaluation, one volunteer realized that for viwiki there are many articles for which the link-recommendation does not generate a single link. qualitative observation of random articles in viwki often yields articles with a single sentence already containing several links. As a result, there are not many possibilities to actually insert a link by the link recommendation. One potential explanation could be that viwiki seems to contain many articles created by bots (this seems to be supported when comparing the ratio of pages that never receive a single pageview). We can speculate that many of the single-sentence articles were created by bots. This suggests that these articles (and their links) were created in a very structured pattern which could explain why the link recommendation model shows the highest performance in the backtesting data (containing many of these sentences).
Update week 2021-04-05:
- went through evaluation from volunteers T278864
- precision of recommendations around 70-90% similar to offline backtesting evaluation (very encouraging)
- helped fix issues around character encoding leading to poor performance in viwiki T279037
- from comments identified possible improvements for the model in next iteration T279434 (not recommending links that are of a certain type, such as calendar dates), T279519 (avoiding links in specific section such as "Sources"), T279521 (Improve parsing to generate anchor placement)
- helped finding possible solutions around performance issues T279411; as well as republishing all datasets T279427
Update week 2021-05-17:
- unfortunately, paper submitted to KDD was not accepted despite relatively positive reviews (3 weak accepts and no major reject)
- currently working on revising the manuscript using feedback from reviewers, in addition we have a chance to include more data from the recent manual evaluation with the volunteers/embassadors T278864
- plan is to submit to CIKM in the applied research track next week (deadline: 2021-05-26)
Update week 2021-06-07:
- working on adding to improve the model based on feedback from volunteers/ambassadors T279434 ; specific request came about removing links to articles about dates and units. Explored the different cases how this is captured within Wikidata. After refining the filter, I am now re-training the model. Before deploying the new models, we will need to do some additional manual checking whether the cases mentioned by volunteets (e.g. T284666 ) are resolved with the changes.
- extending the link recommendation model to 7 new languages. currently training the models and evaluating performance T284481
Update week 2021-06-14:
- implemented changes to the model based on volunteer-feedback (T279434)
- retrained the model for the 4 existing wikis and 7 new wikis (T284481);
- added results of the backtesting evaluation to the project-page on meta. results look promising: the performance is not negatively affected by the changes. model-performance in the 7 new wikis selected for deployment looks promising (no red-flags, performance similar/better to already deployed wikis).
Update week 2021-06-21:
- Growth Team evaluated the 4 deployed models and considered that performance is strong enough to be deployed to additional wikis (revert rate of suggested edits is low, enthusiasm is high) T284481#7171503
- prepared data and model for 6 new wikis (Farsi, French, Hungarian, Polish, Russian, Romanian); those have been moved to production in the public API and are being prepared for deployment T284481
Therefore, I am closing this task as there are no more direct todos as part of the analysis of the model. Follow-up work on the model will be captured in other tasks.