Page MenuHomePhabricator

[Suggestion] Detect if an internal link appears >1x in the same section
Closed, ResolvedPublic

Description

This task involves the work of introducing a new edit suggestion within the VisualEditor Suggestion Mode that makes people aware when an internal link appears >1x time in the same section.

Deployment plan

#TODO. See T413257.

Meta

  • Relevant wish, policy, guideline, template, etc.: en:MOS:DUPLINK
  • Suggestion scope: Many Wikipedias

Requirements

Meta

  • Configuration
    • Account: false
      • Specify which account state the edit check should apply to. Valid values are "loggedin", "loggedout", false. The default false results in the edit check applying to all users.
    • maximumEditcount: 1000000
      • Specify a threshold for the number of changes at which Edit Check is activated. The default 100 means that the edit check will only be shown to users with 100 edits or fewer. If this value is not defined, the default value is used. The number of edits is based on user edit count, edits from all namespaces are taken into account.
    • ignoreSections: []
      • An array of section titles, which will be compared case-insensitively to headings. If a heading matches an item in this array, all content within that section will be ignored for checks.
    • ignoreLeadSection: false
      • If true, the content of the lead section will be ignored for checks. A lead section is defined as content in an article with at least one heading that precedes the first heading.
    • Enabled: true
      • If true, the check will be enabled, assuming all other configuration allows it to be shown. If false, the check will not be shown.
    • Type Suggestion
    • inCategory: []
    • hasTemplate: []
      • An array of templates whose presence should cause this check to be offered.
    • lacksTemplate: []
      • An array of templates whose absence should cause this check to be offered.
  • Detection heuristic:
    • An internal link appears >1x in the same == H2 == within a root level paragraph
    • NOTE: links to different sections within the same articles should be considered different links. I.e. we’d show a suggestion if Article A#Foo was linked twice, but not show a suggestion if Article A + Article A#Foo were each linked.
  • Edit Tag(s): See T413419

User experience

  • Card design
    • Title: Duplicate links
    • Description (≤2 sentences): “This link appears more than once in this section. Help make Wikipedia easier for people to read by removing this link. Learn more.
      • Note: the Learn more link ought to be configurable on a per project basis.
    • Link to learn more: Local language equivalent of WP:MOSLINK (For Enwiki, use MOS:REPEATLINK)
    • Calls to action
      • Remove link: When tapped, remove the link
      • Dismiss: When tapped, leave the link in question unchanged
    • Success toast: Thank you for helping to make this section easier for people to read.

Instrumentation
As with all Edit Checks and Suggestions, we will want to know...

  • Any time a Suggestion of this type is activated within an edit session
  • Any time someone views a Suggestion of this type within an edit session
  • Any time someone engages with a Suggestion and how they engage with it

References

Checklists

Milestones / Review stages
  • Code implemented and on patchdemo
  • Code merged
  • Code deployed and available as an experimental check on wikis. Expected in deployment on Thursday, January 29th.
  • Configuration agreed with community (if necessary)
  • Configuration deployed and available as an experimental check
Implementation checklist
  • User is shown a duplicate link edit check on the second and subsequent occurrence of an internal link within each section.
  • These should be covered ensuring the links are in the fsame h2 at root level:
    • Links inside <ref> must not be counted
    • Links inside image captions, templates etc should not be captioned
    • Links inside templates should be ignored
  • Links to the same document but different sections are different
  • Instrumentation is included
    • Covered by the controller - it doesn't look like we need to do anything special here
  • Check user experience against requirements
  • Toast when the action is resolved, not just when the user hits "fix"
    • Dropping from MVP: it's hard to avoid firing such a toast for a bunch of other stuff, like larger deletions, and it looks like a chunk of work to enable doing so if the user takes positive action from a context panel, which would also be cross-cutting with other edit checks
    • See T401941: Consider generic success / dismiss notifications for checks
  • "Learn more" opens in a new window
  • Define an initial configuration file for enwiki
For further discussion - file a general ticket / file a subtask / it's a checklist item / don't do
  • Deployment plan
    • Config change will be discussed in meetings or async, otherwise no obvious ambiguity
  • Check default configuration / initial config file against requirements
    • Rephrasing "Wikipedia" to make it easier for languages with cases
  • ignoreSections should be fleshed out with "external links", "related", "see also" etc. Should we start by copying from addReference in https://en.wikipedia.org/wiki/MediaWiki:Editcheck-config.json ? Community feedback?
    • We're doing this in configuration
  • Check strings are internationalised
  • Could there be reasons to match inside templates?
    • We'll discuss elsewhere
  • To catch lists, do we want to expand ignoreSections with prefixes/regexps? Match article titles?
    • Not MVP, and cross-cutting, and in any case bullet lists will not be at root level.
  • Highlight visibility / role discussion
    • not mvp, but will raise in meeting
  • Widgets-as-toast discussion
    • not mvp, will raise in meeting
Other notes
  • We might want to allow templates in general but exclude specific templates, eg {{term}} and {{defn}} to ignore duplicates in lists
  • List articles should be ignored, but we don't have an easy way to detect them.

Related Objects

Event Timeline

Looking at MOS:DUPLINK, there are various exceptions that it'd be good to build into the tool. If these rules are different in other languages, it might be good to ensure that Community Configuration can handle them. Specifically:

  • Ignore links in references, infoboxes, tables, image captions, footnotes, and hatnotes.
  • Ignore glossaries (articles in this category or a subcategory, or with the project banner on their talk page)
  • Ignore lists (articles with the project tag, or just any content in list semantic markup)

Oh, and one other nuance: When a link goes to a section of an article (or comes from a redirect that might be a subtopic), those should probably be treated as a separate links. So e.g. There are many types of Wikipedia editors, including [[List of Wikipedia editor types#Wikignomes|Wikignomes]] and [[List of Wikipedia editor types#Wikifairies|Wikifairies]] should not trigger the suggestion.

Ignore links in references, infoboxes, tables, image captions, footnotes, and hatnotes.

If we restrict ourselves to root level paragraphs, I think that covers all of these, as well as most glossary/list pages.

Two related points to consider:

Sometimes, the first linked-mention of another article is only incidental, but the second one is where it's really introduced. Perhaps this offers a way for us to emphasize the human judgment element. To handle it, maybe we could tweak the card-Description to say something like "by removing one of the links" instead of "by removing this link".

Usability-wise, it'd be nice to somehow highlight the earlier/duplicate link(s) (and maybe even instances of the link in other sections?) to make it easy to find and compare. Is this feasible to add now, or perhaps as a subtask?

Per yesterday's team meeting, we've converged on the following requirements for the MVP:

  1. Per what @Esanders shared in T413421#11500745 (and @Sdkb identified in ), we're going to restrict the suggestion to root level paragraphs so as to avoid the suggestion becoming activated on links in references, infoboxes, tables, image captions, footnotes, hatnotes, and most glossary/list pages.
  2. We'll make it so the Suggestion does consider two (or more) links to the same article and different sections with that article as different links and as a result, not appear. Per case @Sdkb helpfully raised in T413421#11490086.

I've updated the task description to reflect the above.

I've got the sketch of the check working – currently it's more like a FirstInternalLinkEditCheck; now it's a matter of getting the algorithm right.

Most of the way there – I have https://gerrit.wikimedia.org/r/c/mediawiki/extensions/VisualEditor/+/1229118 in progress.

  • Only root level
  • Divided by sections

I've spotted a silly bug which will hopefully be quick to fix

We'll land an MVP ticket without internationalisation and instrumentation, and add those in a follow up patch before closing this ticket.

Change #1229118 had a related patch set uploaded (by Zoe; author: Zoe):

[mediawiki/extensions/VisualEditor@master] DuplicateLinksEditCheck initial implementation

https://gerrit.wikimedia.org/r/1229118

Whew, big day! Initial reviewable version on gerritt.

Patchdemo: https://318839e344.catalyst.wmcloud.org/w/index.php?title=Bear&veaction=edit&ecenable=experimental,suggestions

  • Default instrumentation common to all checks looks fine, no need for extra work.
  • I added the link for the MOS page, but we'll probably iterate on whether to do it like that.
  • Took a bit of extra time to make the algorithm more optimal: on a page with a lot of links we only check if they're at the root level at the last possible moment.

Yay! Looking at the patchdemo, a few observations:

  • Could we ensure that clicking "learn more" opens a page in a new tab so that people don't lose their editing workflow?
  • I'm glad to see that the prior instance of the duplicate is highlighted, but it's pretty subtle and might be missed, especially for a longer section. Do we have a design option that'd make it more prominent?
  • The toast appears way in the upper left, far away from where the action took place. This seems consistent with other checks, but it does get me thinking, what if we instead transformed the card into the toast once it was resolved?
  • If I remove the duplicate link by clicking "remove link" on the suggestion card, I get a toast, but if I remove it (or I remove the earlier link, another way to resolve the duplication) through VisualEditor the normal way, I don't get a toast, and the card doesn't go away until I navigate the cursor elsewhere.

It seems like we've discussed the toast option before – it's a bit of a pain with our implementation. I'll discuss highlights. I think it wouldn't be too hard to fire the toast if the user fixes manually: I'll look in to that.

Change #1229118 merged by jenkins-bot:

[mediawiki/extensions/VisualEditor@master] DuplicateLinksEditCheck initial implementation

https://gerrit.wikimedia.org/r/1229118

That's merged. Some small attention to detail to do, like making "learn more" point to the right place, but possibly that's unnecessary because we've yet to solve how to do links in the presence of internationalisation.

Should be available on larger wikis next thursday, at which point I hope to have config ready to go and to be doing some checks that it doesn't impact performance greatly.

I did a quick performance check on a small article, but it's probably worth doing so again on a large article once this has hit production. My dev setup has issues with large articles, if I can't fix that in 15 minutes I'll revisit.

Testing on a larger article of around 150kb copied from enwiki "Bat" I found:

I wrote a patch which uses action.once( "discard" ) to spot when the check is resolved manually – however, it'll also fire if the user deletes a big chunk of text, which doesn't feel like it's quite right.

I think this might need more discussion.

edit: clumsy fingers >:(

Monday morning update:

  • The initial version is merged and will be on the train. We can see how it works in production on Thursday afternoon
  • I need to write an initial configuration for enwiki for Thursday
  • We need to internationalise and work out "learn more" links (outside of this ticket) before this can come out of experimental.

The check is working correctly when duplicate links are added in the same == H2 == within a root level paragraph.

Clarification needed on the following scenario:

If I add duplicate links first under "Heading" and then change the format of the article in such way that they become under a "sub-heading 1", then I don't
see these Suggestion being activated. Is that expected?

Screenshot 2026-02-10 at 1.55.27 PM.png (660×2 px, 193 KB)

Example: https://en.wikipedia.beta.wmcloud.org/wiki/Cat#Etymology_and_naming

@Ryasmeen We're currently configuring it to only recognize duplicates within the same paragraph. We may change that in the future to be within the same [section/sub-section], hence the wording currently used in the user-visible card itself.