Page MenuHomePhabricator

Warn when adding a URL that matches blocked external domains in the 2010 editor
Open, Needs TriagePublic

Description

This is the task tracking implementation of T276857, but specifically against blocked domains (i.e. spamblacklist) and only for the 2010 editor (WikiEditor).

Background

This was the the #10 most-voted wish in the Community-Wishlist-Survey-2023. That wish specifically asked for this functionality in VisualEditor, but we'd like non-VE users to benefit from the feature, too.

The core issue is some users will link to a blocked/spamblacklisted domain, and they won't get notified about it until they actually go to publish the edit. It would be more convenient for them to be informed the URL cannot be used as soon as they add it, i.e. so they can find a more reliable reference.

As part of T337431, we now have a new system called BlockedExternalDomains that is intended to later supersede SpamBlacklist. Thus, the work for this task will only be for the new system. It does not yet have full feature parity with Spamblacklist, but will in the future. For now, we're just going to check raw URLs against MediaWiki:BlockedExternalDomains.json.

Acceptance criteria

  • Enable the feature by setting the $wgAbuseFilterBlockedExternalDomainsNotifications flag to true.
  • In the 2010 editor, if I type a URL to a blocked domain, I should get a notification warning me of such.
  • The notification should state what domain is blocked, and have a "Review link" link that once clicked will highlight the offending URL in the wikitext.
    • If WikiEditor is installed (true for all WMF projects), it should open the link in the link insertion dialog.
  • If using the WikiEditor link insertion dialog, I should also be informed if the URL I entered is blocked.
    • Currently this only happens after the link is inserted, which I think is fine at least for the MVP...

Note this is very similar to what we did for T285508: Show notification when users type a link to a disambiguation page.

Event Timeline

MusikAnimal renamed this task from Warn when adding a URL that matches the SpamBlacklist in the 2010 editor to Warn when adding a URL that matches blocked external domains in the 2010 editor.Sep 26 2023, 10:20 PM
MusikAnimal added a project: AbuseFilter.

Change 961249 had a related patch set uploaded (by MusikAnimal; author: MusikAnimal):

[mediawiki/extensions/AbuseFilter@master] [WIP] Show notification when editor links to a blocked domain

https://gerrit.wikimedia.org/r/961249

Test wiki created on Patch demo by MusikAnimal using patch(es) linked to this task:
https://patchdemo.wmflabs.org/wikis/616d98c01b/w

Test wiki on Patch demo by MusikAnimal using patch(es) linked to this task was deleted:

https://patchdemo.wmflabs.org/wikis/616d98c01b/w/

Test wiki created on Patch demo by MusikAnimal using patch(es) linked to this task:
https://patchdemo.wmflabs.org/wikis/06685e6799/w

Test wiki on Patch demo by MusikAnimal using patch(es) linked to this task was deleted:

https://patchdemo.wmflabs.org/wikis/06685e6799/w/

PatchDemo up and running! https://patchdemo.wmflabs.org/wikis/4040d77204/wiki/Main_Page

You can test the feature by adding a working link to example.org, which is the only domain I have blocked at the time of writing. To be clear, a working external link here means the full URL including the protocol (https://example.org) or a wikitext external link ala [https://example.org foobar].

You can block other domains at https://patchdemo.wmflabs.org/wikis/4040d77204/wiki/Special:BlockedExternalDomains after logging in as the Patch Demo user.

I'm eager to hear feedback that anyone may have!

Change 961249 merged by jenkins-bot:

[mediawiki/extensions/AbuseFilter@master] Show notification when editor links to a blocked domain

https://gerrit.wikimedia.org/r/961249

Change 970412 had a related patch set uploaded (by Samtar; author: Samtar):

[operations/mediawiki-config@master] InitialiseSettings-labs: Enable AbuseFilterBlockedExternalDomainsNotifications on enwiki.beta

https://gerrit.wikimedia.org/r/970412

Change 970412 merged by jenkins-bot:

[operations/mediawiki-config@master] InitialiseSettings-labs: Enable AbuseFilterBlockedExternalDomainsNotifications on enwiki.beta

https://gerrit.wikimedia.org/r/970412

Moving to the Kanban so this can get a little QA. Let me know if the acceptance criteria or anything is unclear.

Once QA gives the sign-off, we can schedule deployment to turn this on the applicable wikis.

@MusikAnimal I did a bit of testing of this on Friday and found a few potential bugs. In order of importance (in my opinion):

  • There are circumstances where clicking on "Review link" will take you to the blocked url instead of opening the "Insert link" dialog.
    • This happens when you move or delete the link you just inserted before clicking "Review link".
      • e.g. type a link, see the popup, before the link type "[", click "Review link"
    • Having the blocked url as the href of the "Review link" seems vulnerable to me. A user might open it in a new tab. I also don't know if all browsers/devices will behave the same way when you click on it.
  • If the blocked URL is not the first thing on the line, after clicking "Review link", the URL automatically entered in the "Insert link" dialog is incorrect.
    • e.g. type foobar <blocked URL>, click "Review link", in the "Insert link" dialog the URL will be "foobar <blocked URL>"
  • The popup only appears if the URL starts with http:// or https:// (but not if you add it after).
    • You can make a legit link with just // (e.g. [//<blocked URL>]). If you try to publish this, AbuseFilter will stop you (at least on beta). Perhaps we should maintain functional parity.
  • If there is already a popup and you trigger another popup, it flashes in the top left.
    output.gif (768×1 px, 1 MB)
  • If copy and pasting, only warns if the blocked URL is just before point.
    • e.g. copy and paste foobar <blocked URL> foobaz, you will not see a popup
  • If the text of the link is a blocked domain but the actual link is not, still see popup.
    • e.g. [https://duckduckgo.com <blocked URL>]
    • Arguably a popup here is not so bad as the edit does not seem in good faith.
  • Does not always work well with find and replace.
    • Not triggered if you do "Replace all".
    • If you do "Find next" and "Replace" one by one, the "Review link" does not insert the link in the correct place.
      output.gif (768×1 px, 1 MB)

@MusikAnimal I did a bit of testing of this on Friday and found a few potential bugs. In order of importance (in my opinion):

Thank you for this thorough review!

  • There are circumstances where clicking on "Review link" will take you to the blocked url instead of opening the "Insert link" dialog.
    • This happens when you move or delete the link you just inserted before clicking "Review link".
      • e.g. type a link, see the popup, before the link type "[", click "Review link"
    • Having the blocked url as the href of the "Review link" seems vulnerable to me. A user might open it in a new tab. I also don't know if all browsers/devices will behave the same way when you click on it.

I intentionally put the URL as the fallback when the blocked URL can't be found in the wikitext due to changes made after the notification (akin to what we did for T285508), but I didn't really think about security! The editor has to type (or paste, if I fix that) the URL, and I was coming more from a EditCheck mindset, but you're of course totally right that BlockedExternalDomains is used to block malicious URLs too, so we should at the very least avoid direct links. I'll remove the href, but I'm not sure what to do for a fallback. I guess another message (except of white/info type) that reads "The blocked domain appears to have been removed". What do you think?

  • If the blocked URL is not the first thing on the line, after clicking "Review link", the URL automatically entered in the "Insert link" dialog is incorrect.
    • e.g. type foobar <blocked URL>, click "Review link", in the "Insert link" dialog the URL will be "foobar <blocked URL>"

Ack

  • The popup only appears if the URL starts with http:// or https:// (but not if you add it after).
    • You can make a legit link with just // (e.g. [//<blocked URL>]). If you try to publish this, AbuseFilter will stop you (at least on beta). Perhaps we should maintain functional parity.

Good point. I will change the regex to use $wgUrlProtocols.

  • If there is already a popup and you trigger another popup, it flashes in the top left.
  • If copy and pasting, only warns if the blocked URL is just before point.
    • e.g. copy and paste foobar <blocked URL> foobaz, you will not see a popup

Sounds like an issue with mw.notify, but I'll see what can be done.

  • If the text of the link is a blocked domain but the actual link is not, still see popup.
    • e.g. [https://duckduckgo.com <blocked URL>]
    • Arguably a popup here is not so bad as the edit does not seem in good faith.

Ideally if the backend allows saving the URL, we shouldn't warn about it. This might be hard to fix :/

  • Does not always work well with find and replace.
    • Not triggered if you do "Replace all".
    • If you do "Find next" and "Replace" one by one, the "Review link" does not insert the link in the correct place.

Ack. I'm not certain this can be fixed easily but I'll think of something.


Thanks again for this amazing review! As this is still behind a feature flag and I'm busy with other priorities, I think I might save polishing this off for the next wishathon or rainy day.

@dom_walden also uncovered a performance issue with this. I think this task a whole needs a bit more thought. It will be a while before I am able to revisit it, so we're going to disable the feature on Beta for the time being.

The patch seems to be making an action=raw call to fetch MediaWiki:BlockedExternalDomains.json. If I'm reading correctly, that would fetch the JSON every time any page is opened for editing. Ideally, ResourceLoaderWikiModule should be used which can effectively cache the JSON on the client localStorage and also invalidate it in about 5 minutes when the json page is edited onwiki.

I don't know when I'll be able to revisit this. The original wish didn't even ask for a 2010 editor solution, only for VisualEditor, which is slated to be part of the EditCheck project.

The regular expression approach doesn't work as well as it did for Disambiguation-Wish-2021 given one can enter plain links, as opposed to required syntax for wikilinks like [[foo]]. A better approach might be to only run the check against the disallow list when we know the user is trying to enter an external URL, such as via the "Insert link" tool in WikiEditor. The plain editor shipped with MediaWiki doesn't have such a feature, so I'd argue if we go this approach, the code should be moved from AbuseFilter to the WikiEditor repo.

At any rate, I don't think it's ideal to leave $wgAbuseFilterBlockedExternalDomainsNotifications lying around in an unstable state. I'm too busy with other projects, so I'm left with these options:

  1. Remove $wgAbuseFilterBlockedExternalDomainsNotifications and related code entirely (doesn't revert cleanly)
  2. Rework to only be for WikiEditor's link insertion dialog, moving the code to the WikIEditor repo (simple enough, but not very high impact)
  3. Go the extra mile and perfect the regex and logic so that this works in any wikitext editor, also addressing the performance concern pointed out at T347435#9395062. (needs volunteer)

Any opinions? If I don't hear anything in a week or two, I will probably just go with #1 and remove the code.

Change 1005127 had a related patch set uploaded (by MusikAnimal; author: MusikAnimal):

[mediawiki/extensions/AbuseFilter@master] Remove $wgAbuseFilterBlockedExternalDomainsNotification and related code

https://gerrit.wikimedia.org/r/1005127

Change 1005127 merged by jenkins-bot:

[mediawiki/extensions/AbuseFilter@master] Remove $wgAbuseFilterBlockedExternalDomainsNotification and related code

https://gerrit.wikimedia.org/r/1005127

I don't see the warning on beta anymore. I hope that means it has been removed correctly.

I think it's still a reasonable idea for future consideration, so it's worth keeping the ticket open.