Page MenuHomePhabricator

Make __NOINDEX__ work on all namespaces on Meta-Wiki
Closed, ResolvedPublic

Description

I tried yesterday to perform a request to have some content excluded from search engines, and noticed that __NOINDEX__ does not work on the main Meta-Wiki namespace.

Given the nature of Meta-Wiki, the NS_MAIN works differently as it happens on the rest of content projects. We don't store encyclopedic content ther. All cross-wiki stuff, such as vandalism reports, RFCs, etc. does get indexed and sometimes search engines do show those "negative" remarks as first results. This does not only happen to NS_MAIN but on many other namespaces. We've got several requests in the past to hide from search engines some pages and we've found that the only way to do it is by modifying the robots.txt, which requires an admin.

On some other projects those pages do get NOINDEXed via adding the magic word or by modifying the robots.txt file, or both. Meta-Wiki should operate a different policy and allow the magic word __NOINDEX__ work on all namespaces, included NS_MAIN.

I wonder if we should set them to nofollow as well.

Thanks.

Details

Related Gerrit Patches:
operations/mediawiki-config : masterAllow __NOINDEX__ on all namespaces on meta.

Event Timeline

Restricted Application added subscribers: JEumerus, Matanya, Aklapper. · View Herald TranscriptNov 8 2016, 11:04 AM
MarcoAurelio triaged this task as Low priority.Nov 8 2016, 11:05 AM
Dereckson added a subscriber: Dereckson.EditedNov 15 2016, 6:48 PM

The correct setting here is $wgExemptFromUserRobotsControl.

Notice the "If set to null, default to $wgContentNamespaces." part. It describes the MediaWiki behavior, and not our configuration, as we do this:

$wgExemptFromUserRobotsControl = array_merge( $wgContentNamespaces, $wmgExemptFromUserRobotsControlExtra );

Change 321713 had a related patch set uploaded (by Dereckson):
Allow NOINDEX on all namespaces on meta.

https://gerrit.wikimedia.org/r/321713

Alternatively we could just declare NS0 to not be content for meta, which would be more accurate perhaps?

Alternatively we could just declare NS0 to not be content for meta, which would be more accurate perhaps?

Content namespaces have a very large scope: Special:Random, visual editor

To modify such a used setting for one identified need doesn't seem valuable. Yes, we can add the NS0 back in each configuration parameter, but that's more efficient to fix the namespaces.

See https://gerrit.wikimedia.org/r/321712 for the offered solution.

The reason __NOINDEX__ doesn't work on content pages by default is that we were concerned that this magic word could be misused or abused on certain wikis. For example, imagine a user adding a __NOINDEX__ tag to a prominent article such as https://en.wikipedia.org/wiki/Barack_Obama, intentionally via vandalism or unintentionally via an accidental template transclusion. We didn't and don't want these pages to be go missing from search engine indices.

Meta-Wiki is obviously different than a Wikipedia. Is anyone concerned about vandalism or misuse of the magic word on Meta-Wiki? If not, this sounds fine to me.

@Dereckson It looks like there are no concerns for this. Can you arrange to have this deployed? Thank you.

Change 321713 merged by jenkins-bot:
Allow NOINDEX on all namespaces on meta.

https://gerrit.wikimedia.org/r/321713

MarcoAurelio closed this task as Resolved.Nov 29 2016, 8:04 PM

Mentioned in SAL (#wikimedia-operations) [2016-11-29T20:05:04Z] <thcipriani@tin> Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:321713|Allow NOINDEX on all namespaces on meta]] (T150245) (duration: 00m 44s)

Restricted Application added a project: User-MarcoAurelio. · View Herald TranscriptAug 21 2017, 1:09 PM