Page MenuHomePhabricator

Add a link: maximum number of links per article
Closed, ResolvedPublic

Description

Multiple communities have had concerns that the "add a link" workflow causes overlinking, in which too many links are made per article. Not only do they not want to see overlinking occur, they also don't want the workflow to teach newcomers that adding many many links to an article -- rather newcomers should be taught to be judicious with their additions.

To that end, we want to limit how many suggestions for links are in each article. Right now, we have a limit of 10 links, based on the algorithm.

Here's what we want to do:

Event Timeline

Currently Add Link has a maximumLinksPerTask property (exposed to community configuration, although not via the special page form) which controls how many links the service is instructed to generate. This currently defaults to 10; higher values cause problems (link generation is slow).

AIUI we want to keep generating 10 links, but then use the 3 best; so we want to add a new property (maximumLinksToShow?), and apply the corresponding filtering logic when the task data is loaded from cache (so don't need to do a cache refresh & community configuration changes are instantaneous).

Currently Add Link has a maximumLinksPerTask property (exposed to community configuration, although not via the special page form) which controls how many links the service is instructed to generate. This currently defaults to 10; higher values cause problems (link generation is slow).

AIUI we want to keep generating 10 links, but then use the 3 best; so we want to add a new property (maximumLinksToShow?), and apply the corresponding filtering logic when the task data is loaded from cache (so don't need to do a cache refresh & community configuration changes are instantaneous).

That sounds right to me. We could maybe save some processing time by lowering the maximumLinksPerTask value to e.g. 5 or compute it based on the value of maximumLinksToShow but that seems like more complexity than it's worth.

Sgs changed the task status from Open to In Progress.Feb 8 2022, 3:08 PM
Sgs claimed this task.

Change 761402 had a related patch set uploaded (by Sergio Gimeno; author: Sergio Gimeno):

[mediawiki/extensions/GrowthExperiments@master] Add a link: maximum number of links per article

https://gerrit.wikimedia.org/r/761402

@MMiller_WMF I have observed some bias towards multiple words (particularly two words) concepts while applying the requirement of These should be the three highest accuracy suggestions of all the available ones in the article. For instance "mobile payment", "Fast-moving consumer goods", "Joint venture". Are we sure we want to apply this sorting or should we analyze better if the more accurate links don't have any issues?

@kostajh mentioned also he was unsure about this sorting. Could you tell us what are your doubts about it?

Just want to mention that there are different options for prioritizing the 3 best links:

  • highest prediction accuracy (as suggested above)
  • highest up in terms of the position in the text. while click-through rates (CRT) is only one proxy for the utility of a link, we know that CRT of links decreases with their relative position in the text (see Fig. 3b of "Improving Website Hyperlink Structure Using Server Logs", link to pdf). this is already implicitly encoded in the algorithm as we generate recommendations by iterating through the text from beginning to end
  • longer anchors (as in number of words when splitting the anchor by whitespace). my intuition would be that a longer anchor contains less ambiguity about the destination that should be linked to. For example, "Berlin, Kansas" is pretty obvious whereas "Berlin" could link to many different articles about different aspects of the city. This is included implicitly as a feature in the prediction algorithm so should be reflected in the prediction-score. However, since you mention the bias about "two-word concepts" this might be a useful heuristic to apply.

Thanks for the notes, @MGerlach. @Sgs, let's stay with the "highest accuracy" sorting for now, and we'll have Martin's note in case we want to try something different.

Change 769034 had a related patch set uploaded (by Sergio Gimeno; author: Sergio Gimeno):

[schemas/event/secondary@master] Add a link: add number_phrases_shown value for add a link impressions

https://gerrit.wikimedia.org/r/769034

Change 769034 merged by jenkins-bot:

[schemas/event/secondary@master] Add a link: add number_phrases_shown value for add a link impressions

https://gerrit.wikimedia.org/r/769034

@Etonkovidova to test different values for maximumLinksToShowPerTask, you can modify MediaWiki:NewcomerTasks.json so that the link-recommendation section looks like this:

"link-recommendation": {
    "group": "easy",
    "type": "link-recommendation",
    "maximumLinksToShowPerTask": 2
},

Change 761402 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] Add a link: maximum number of links per article

https://gerrit.wikimedia.org/r/761402

Checked in betalabs - works as expected; need testing in production.