Page MenuHomePhabricator

Add bot generated spam reports on enwiki to robots.txt
Closed, ResolvedPublic

Description

There have been numerous complaints to OTRS about enwiki's bot generated spam reports[1] showing up high on search results, associating the site with spam, even though that isn't always the case, many things will result in a report being generated: account name is "similar" to a domain, IP "close" to the domain adds the link, and there's little differentiation between someone who adds one link and someone who adds 100.

Adding the following to robots.txt in the section to disallow all user-agents should fix this:
Disallow: /wiki/Wikipedia:WikiProject_Spam/LinkReports/
Disallow: /wiki/Wikipedia%3AWikiProject_Spam/LinkReports/

[1] http://en.wikipedia.org/w/index.php?title=Special%3APrefixIndex&from=WikiProject_Spam%2FLinkReports&namespace=4


Version: unspecified
Severity: enhancement
URL: http://en.wikipedia.org/robots.txt

Details

Reference
bz13398

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 10:04 PM
bzimport set Reference to bz13398.
bzimport added a subscriber: Unknown Object (MLST).
  • Bug 13529 has been marked as a duplicate of this bug. ***

ral315 wrote:

I would expand it to instead include the following:

Disallow: /wiki/Wikipedia:WikiProject_Spam/
Disallow: /wiki/Wikipedia%3AWikiProject_Spam/

That way, it includes everything under the main page, including a few pages that I think wouldn't be covered, but might need to be covered, by the robots.txt file.

basic subpages should not be ignored the pages that cause the problems all start with Wikipedia:WikiProject Spam/Link so Wikipedia:WikiProject Spam/Link* and Wikipedia talk:WikiProject Spam/Link* should be added. ~

Added:

https://bugzilla.wikimedia.org/show_bug.cgi?id=13398

Disallow: /wiki/Wikipedia:WikiProject_Spam/
Disallow: /wiki/Wikipedia%3AWikiProject_Spam/