Page MenuHomePhabricator

Addition of __NOSPIDER__ token for pages
Closed, DuplicatePublic



It sometimes happens that a user is discussed unfavorably due to disruption, sockpuppetry, edit warring, or vandalism.

When they leave (or are banned), the pages they were discussed on are still google-able, which is a major dilemma - in extreme cases we have had to rewrite all their signatures, and often we have to courtesy blank the page, purely to prevent it being spidered.

In a number of cases these users then return (with or without permission), which leads to a further problem as administrators seeking to understand history, cannot easily do so.

Would a token NOSPIDER be possible? A page containing this token would render to an error page or blank, if the viewer was a robot or spider, in some manner (I don't know the best way technically).

(Perhaps one easy way might be, if the viewer is an anonymous IP it goes to a page that says "This page is blocked from spiders, if you are a human please enter this CAPTCHA to view." Most spiders/robots aren't logged in.)

This would be a useful tool to ensure our needs for edit histories and pages to remain useful to administrators in future, and the fair needs of a user not to be googled that way on the rest of their life, conflict less. Rather than having to wholesale edit swathes of the wiki, we could tag certain pages as NOSPIDER and then they would rapidly drop off search engine caches (meeting the best interest of the party) and yet be more often able to be left intact (for us). It would also have the advantage that being invisible to the rendering engine for most users, and very easy to apply, we could actually use it more widely when this problem comes along.

Version: unspecified
Severity: enhancement



Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 10:08 PM
bzimport set Reference to bz14209.
bzimport added a subscriber: Unknown Object (MLST). wrote:

Defaulting to an error page if its a spider would be best (if technically possible), since this means the page title completely drops out of search engine caches. It's important the page title wouldn't be shown to a spider however it's done, since this often links to the name involved. wrote:

Sadly after writing this, problems arise:

1/ potential huge burden if added, unless "only used in specific narrow cases"
2/ would kills search engine access to many pages if widely used, and right now search engines are the only effective way to find things even a few weeks old, in project space, and
3/ we already advise people to not use a readily searchable name on the signup page anyway, now

Ah well, a nice idea.

brion added a comment.May 22 2008, 6:18 PM
  • This bug has been marked as a duplicate of bug 8068 ***