Page MenuHomePhabricator

Disable robot indexing for user pages on Wikidata
Closed, ResolvedPublic

Description

Currently, user pages on Wikidata are indexed by robots, therefore they appear on search engines. (this can be seen on the page information).

Usually on the Wikimedia projects, the user pages are not indexed for privacy reasons.
Is it possible to disable this feature on Wikidata?
Any other comments pro/con disabling the indexing?

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 28 2017, 4:36 PM
Sjoerddebruin added a subscriber: Sjoerddebruin.EditedNov 28 2017, 4:39 PM

It is possible to do this, even without adjusting code. It's just adding some rule to https://www.wikidata.org/wiki/MediaWiki:Robots.txt. Only community consensus is required.

matej_suchanek added a subscriber: matej_suchanek.

For specific namespaces, this is usually configured via $wgNamespaceRobotPolicies.

Urbanecm changed the task status from Open to Stalled.Nov 28 2017, 7:23 PM
Urbanecm triaged this task as Lowest priority.
Urbanecm added a subscriber: Urbanecm.

As told before, there needs to be community decision. This is possible by adding a single line to wiki configuration.

Change 393845 had a related patch set uploaded (by Framawiki; owner: Framawiki):
[operations/mediawiki-config@master] Set $wgNamespaceRobotPolicies for wikidata

https://gerrit.wikimedia.org/r/393845

@Sjoerddebruin @Urbanecm Good point :) I let a message on Project Chat, I think it's enough for a first overview. If there are a lot of discussions and different points of view then we can move to a RC.

The "Usually on the Wikimedia projects, the user pages are not indexed for privacy reasons." seems to have misled a lot of people to support this. Please provide sourcing for that.

enwiki recently introduced userspace noindexing, but it was not for privacy reasons. I am not aware of many other projects which do this, a dip sample of some major (commons and itwiki) and some minor (dewiktionary, dewikiquote) projects indicates that some do index userspace and others (dewiki, eswiki) don't.

Change 393845 merged by jenkins-bot:
[operations/mediawiki-config@master] Set $wgNamespaceRobotPolicies for wikidata

https://gerrit.wikimedia.org/r/393845

Mentioned in SAL (#wikimedia-operations) [2017-12-14T19:29:07Z] <thcipriani@tin> Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:393845|Set $wgNamespaceRobotPolicies for wikidata]] T181525 (duration: 01m 04s)

Framawiki closed this task as Resolved.Dec 14 2017, 8:04 PM
Framawiki claimed this task.

Deployed on wikidata.

Note that, per discussions about this change, it can be great to see if it's possible to set this config as default for all wikis. If someone wants to start a discussion on meta...