Page MenuHomePhabricator

Exclude some he wiki pages in robots.txt
Closed, InvalidPublic

Description

Please add the following to robots.txt for he wikis:

Disallow: /wiki/מיוחד:
Disallow: /wiki/מיוחד%3A
Disallow: /wiki/%D7%9E%D7%99%D7%95%D7%97%D7%93:
Disallow: /wiki/%D7%9E%D7%99%D7%95%D7%97%D7%93%3A

I'm not sure the %3A is required, but all robots.txt entries have both variants.

The מיוחד namespace is the Hebrew translation of the Special namespace.

Event Timeline

Aklapper renamed this task from Changes to robots.txt to Exclude some he wiki pages in robots.txt.Nov 30 2019, 2:37 PM

@Fuzzy: Hi, could you please explain why? Could you please explain if this is your personal preference, or if the wiki communities are aware and support this request? Please see and follow https://meta.wikimedia.org/wiki/Requesting_wiki_configuration_changes when requesting such kinds of changes. Thanks!

Urbanecm subscribed.

Changes like this one can be done on-wiki via MediaWiki:Robots.txt.

Fuzzy changed the task status from Invalid to Resolved.EditedNov 30 2019, 10:44 PM

Thanks, @Urbanecm provided solution to the robots.txt issue.

Nevertheless, some explanation of the issue: It appears Google's crawler tries (and fails) to index pages of the format מיוחד:דפים המקושרים לכאן/[article name]. Those are the Hebrew equivalent of Special:WhatLinksHere/[article name]. The current robots.txt file has Disallow: /wiki/Special rule that prevent the crawler from visiting those pages in English wikis. The suggestion was to add the same rule for Hebrew wikis.