Page MenuHomePhabricator

Auto-noindex user/user talk pages for blocked users.
Closed, ResolvedPublic


Author: gmaxwell

When a user is blocked mediawiki should automatically send no-index to spiders for all of their user/user_talk pages.

Ideally this would include subpages, but since those are not formally identified as being connected with the user it may be that including them isn't realistic.

This will prevent queries on outside search engines for a persons name bringing up user pages which have "this user is blocked" notices. It will also curtail the ability of blocked users to use their user talk pages as public soapboxes, without the trouble of keeping the page protected blank.

Version: unspecified
Severity: enhancement

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 9:59 PM
bzimport set Reference to bz11443.
bzimport added a subscriber: Unknown Object (MLST).

Created attachment 4482
Adds noindex,nofollow for User and User_talk pages of blocked accounts

The attached patch fixes this bug, by adding noindex and nofollow to user and user talk pages (but not subpages) of the blocked accounts.

The problem of this patch is, it doesn't affect the cached version of a page. Possible solutions are touching the user and user talk pages when a block happens, or purging their cache when a block happens. I can't think of a solution which doesn't depend on page cache at the moment.


ayg wrote:

I don't think we want to have this hardcoded. Perhaps it would make more sense as a block option, but we possibly have too many of those already. Maybe a configuration option. Also, assuming some sort of automatic solution, it might not make sense to do this for someone who's only been blocked for a short period of time: perhaps this should be only for permanent blocks.

Alternatively, we can simply implement bug 9415 and allow this to be implemented manually, which gives more flexibility and avoids all problems except perhaps tedium.

gmaxwell wrote:

After seeing the above patch I was thinking "configuration option".

Making it a separate step for users just creates extra work, and extra things which can be forgotten.

Right now if someone maliciously creates user "John Q. Public of 1234 pine street" causes trouble, then gets blocked, their bogus userpage with block notice can easily end up a top hit on Google (if, for example, this happens on English Wikipedia). The real John Q. Public of 1234 pine street might be rightly annoyed that this page comes up when you google his name... but his pleas for remedy may go unanswered because the project administrators are already flooded with inane attempts to game the blocking system. As such the behavior should really be by default. If there is a manual override, it could be used to un-no-index the page, in the rare case where there is really a need to do it.

I suppose it wouldn't cause any harm to make it compare block expiration time to a configured threshold, and only no-index when the block is longer than the threshold... but on the other hand, spiders by their very nature will come again, and no-indexing a page for a short time-span is also likely to not cause harm.

9415 has its own problems, as documented on that bug. I really see it as an orthogonal feature, in any case... manual ability makes sense for cases that can't be done automatically, automatic action makes sense where there is a clearly right thing to do.

I'd agree that if we have this, it should automatically apply.

Maybe it should only be for indef blocks...

This sounds ok to me as written; I don't think there's a huge need for a config option. On the other hand it's easy to throw it in.

Change 971344 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[mediawiki/core@master] Remove weird special case from BlockUtils::parseBlockTarget

Change 971344 merged by jenkins-bot:

[mediawiki/core@master] Remove weird special case from BlockUtils::parseBlockTarget