Page MenuHomePhabricator

Add a link: avoid suggesting first names
Closed, ResolvedPublic

Description

A user at French WIkipedia reports:

being offered a bunch of "Anne" on the first name of a person whose first name is Anne, I don't know if it's useful. I was disappointed because I imagined that the program would offer a few links for a list of unrelated words.

I haven't seen it during my tests in French.

Event Timeline

Trizek-WMF renamed this task from Add a link: avoid suggesting surnames to Add a link: avoid suggesting first names.Jul 20 2021, 6:29 PM
Trizek-WMF created this task.

@Trizek-WMF thanks for flagging.

I cannot reproduce the exact error (maybe I dont fully understand). The main reason is that the word "Anne" is not part our anchor-dictionary in frwiki (it was filtered beforehand), so the algorithm cannot suggest links with that word as an anchor.

In general, I can see the problem of links suggested on first names. I dont have a good solution for this now. One idea could be to add page belonging to first names to the "filter for not linking to specific article types" T279434. For this we could use Wikidata-item given name (Q202444) and then filter all articles that are an instance-of this item.

I think excluding given name (Q202444) would be the best solution to address the issue, since it doesn't makes much sense to link to given names.

In fact, the item Given_name (Q202444) contains several relevant items that are subclass_of that:

For example, Lucy (Q13365715) is an instance_of female_given_name (Q11879590).

Thus, we would have to add those four items above to the list of items in filter_dict_anchor.py, which filters the anchor-dictionary to remove links to articles with certain properties based on their properties in wikidata. Specifically, we remove all links to articles whose wikidata-item is an "instance_of" of any of the items listed in list_qid_filter. After re-running the training pipeline, these links should not be recommended anymore.

With @MGerlach's latest comment, I think this task is now actionable for the Growth team.

Change 766207 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[research/mwaddlink@main] Do not recommend given names

https://gerrit.wikimedia.org/r/766207

Change 766207 merged by jenkins-bot:

[research/mwaddlink@main] Do not recommend given names

https://gerrit.wikimedia.org/r/766207

Change 766764 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[operations/deployment-charts@master] linkrecommendation: Bump version

https://gerrit.wikimedia.org/r/766764

Change 766764 merged by jenkins-bot:

[operations/deployment-charts@master] linkrecommendation: Bump version

https://gerrit.wikimedia.org/r/766764

kostajh subscribed.

I've deployed this change. I am not invalidating all the cached recommendations; Tthe cached recommendations will gradually drop out and be replaced by new recommendations which will exclude given names (or any items associated with the Wikidata IDs provided in T287034#7728659).

kostajh triaged this task as Medium priority.
kostajh added a subscriber: Tgr.
Etonkovidova subscribed.

Checked on frwiki wmf.24 (and ruwiki) - during the testing there were no instances of first name links in suggested links.