Page MenuHomePhabricator

Spam blacklist excludes on WikiData and spill-over to client projects
Open, Needs TriagePublic

Description

Currently the global spam-blacklist has many rules that block the official domain of subjects to be used throughout our projects (and external projects that use our spam blacklist extension). Examples include bit.ly (a shortening service heavily abused) being the official site of the subject article (https://en.wikipedia.org/wiki/Bitly). pornhub.com (porn site used for 'shock abuse' on many wikis) being the official site of the subject article (https://en.wikipedia.org/wiki/Pornhub).

Many wikis have local whitelisting to allow for an appropriate link to be used on the pages where they want to use it (either an exclusion of the top domain only, or another neutral landing page as is common practice on en.wikipedia; for https://en.wikipedia.org/wiki/Bitly the link https://www.bitly.com/?main was whitelisted).

WD could in principle also whitelist the top domain and use it for the items where it is the official link. However, that would 'spill-over' to any wiki who is using that item. (example: on en.wikipedia the template {{official website}} (without parameters) uses the value that is stored on WikiData. For [[Cloud mining]] the link that is used on WikiData is blacklisted globally, and the template {{official website}} cannot be added to the en.wikipedia article (https://en.wikipedia.org/wiki/Cloud_mining; https://en.wikipedia.org/wiki/Special:Log?type=spamblacklist&page=Cloud+mining). The same would happen if WikiData would whitelist tinyurl.com for use on the one WD item where it is the official website, the en.wikipedia edit (or any of the 800+ other MediaWiki projects) that tries to use the WikiData item will obey the global blacklist and disallow the edit; whitelisting tinyurl.com on en.wikipedia would then again allow the edit, but also allow for the use of tinyurl in other pages - the latter being a problem where the top-domain is the actual source of the abuse. Moreover, allowing the top domain on a link can (through templates) be abused to enable deep-linking). The only way that they could be used on WikiData is when every wiki that is using the item is immediately following suit, which in some cases simply negates the global spam blacklist and allows for abuse/spam to continue.

Note: allowing pornhub.com to be linked on WIkiData on its own item, through whitelisting, also allows for school students to change their school official website on the WikiData item to pornhub.com (the common abuse on en.wikipedia of this site), which would show that website on any other project that has an article on your school; https://www.wikidata.org/wiki/Special:Log?type=spamblacklist&user=2A01%3ACB08%3A377%3A2000%3A4CAB%3ACB84%3A2ED%3AE960; https://www.wikidata.org/wiki/Special:Log?type=spamblacklist&user=69.168.242.45; https://www.wikidata.org/wiki/Special:Log?type=spamblacklist&user=83.46.126.192&page=&wpdate=&tagfilter=; https://www.wikidata.org/wiki/Special:Log?type=spamblacklist&user=217.181.28.1&page=&wpdate=&tagfilter=).

There currently is no proper way to allow these links on these WikiData items without causing widescale 'disruption' throughout all other projects (this likely involves hundreds to thousands of globally blacklisted domains which should be linked as official websites on WikiData, which each can affect the linked pages on up to hundreds of other projects, or more knowing that single WikiData items can be used on multiple pages on one wikis).

Event Timeline

Two points: