In T245330, we tested out the link recommendation algorithm and decided that it has enough potential to continue with. This task is about issues that were discovered in testing it on T245330, and potential improvements to make. For each of the following issues, we may decide not to try to improve it, if this would cause us problems with precision, recall, or scalability:
- Not linking in the middle of a multi-word phrase, e.g. only linking “London” in “London Underground” (examples)
- Linking to the correct disambiguation of a word, e.g. from an article about biology, linking to “Cell (biology)” instead of to “Cell (geometry)”.
- Not suggesting a link if that word or phrase has already been linked in the article.
- Not linking inside the titles of external links (example article)
- Not linking inside the names of organizations, e.g. “Access & Publishing Group”. (example article)
- Not linking common names of people (example article)
- Not linking to common phrases, e.g. “Yet another” (example article)
- Not linking to article about dates, like the article for 1979.
- @Dyolf77_WMF points out that in the Arabic language, the word "the" is a prefix that is attached to the beginning of the noun, such as in this word: الدعاء, which means "dua", a type of prayer. It should like to: دعاء, which is the singular form, without the prefix. We still want words like this to link to the correct article. Does our current algorithm account for these sorts of prefixes? And plural forms?
In terms of wikis to prioritize, we are planning on working in these Wikipedias first:
- kowiki (Korean)
- cswiki (Czech)
- arwiki (Arabic)
- viwiki (Vietnamese)
- frwiki (French)
- ukwiki (Ukrainian)
- srwiki (Serbian)
- hywiki (Armenian)
- huwiki (Hungarian)
- euwiki (Basque)
- plwiki (Polish)
- fawiki (Persian)
- itwiki (Italian)
- ptwiki (Portuguese)
- hewiki (Hebrew)
- svwiki (Swedish)
- dawiki (Danish)