Page MenuHomePhabricator

robots.txt for no.wikipedia.org
Closed, InvalidPublic

Description

Author: baard

Description:
Please add the following to robots.txt for nowiki:

Disallow: /wiki/Bruker
Disallow: /wiki/Brukerdiskusjon
Disallow: /wiki/Wikipedia:Administratorer
Disallow: /wiki/Wikipedia-diskusjon:Administratorer
Disallow: /wiki/Wikipedia:Sletting
Disallow: /wiki/Wikipedia-diskusjon:Sletting
Disallow: /wiki/Spesial

See http://no.wikipedia.org/wiki/Wikipedia:Tinget#S.C3.B8kemotorer_beh.C3.B8ver_ikke_se_alle_sider for discussion.


Version: unspecified
Severity: enhancement
URL: http://no.wikipedia.org/robots.txt

Details

Reference
bz11432

Event Timeline

bzimport raised the priority of this task from to Lowest.Nov 21 2014, 9:59 PM
bzimport set Reference to bz11432.
bzimport added a subscriber: Unknown Object (MLST).

jeluf wrote:

Added:

11432

Disallow: /wiki/Bruker:
Disallow: /wiki/Bruker%3A
Disallow: /wiki/Brukerdiskusjon
Disallow: /wiki/Wikipedia:Administratorer
Disallow: /wiki/Wikipedia%3AAdministratorer
Disallow: /wiki/Wikipedia-diskusjon:Administratorer
Disallow: /wiki/Wikipedia-diskusjon%3AAdministratorer
Disallow: /wiki/Wikipedia:Sletting
Disallow: /wiki/Wikipedia%3ASletting
Disallow: /wiki/Wikipedia-diskusjon:Sletting
Disallow: /wiki/Wikipedia-diskusjon%3ASletting
Disallow: /wiki/Spesial:
Disallow: /wiki/Spesial%3A

vacuum wrote:

The previous will not always work, and the snippet belove should probably be somewhat better. It will not be a complete solution to the problem with search engines failing to find articles without parents, but it will allow an additional hack for some of the special pages.

Disallow: /wiki/Spesial:S%C3%B8k
Disallow: /wiki/Spesial%3AS%C3%B8k
Disallow: /wiki/Special:S%C3%B8k
Disallow: /wiki/Special%3AS%C3%B8k
Disallow: /wiki/Spesial:Tilfeldig_side
Disallow: /wiki/Spesial%3ATilfeldig_side
Disallow: /wiki/Special:Tilfeldig_side
Disallow: /wiki/Special%3ATilfeldig_side
Disallow: /wiki/Bruker:
Disallow: /wiki/Bruker%3A
Disallow: /wiki/Brukerdiskusjon:
Disallow: /wiki/Brukerdiskusjon%3A
Disallow: /wiki/User:
Disallow: /wiki/User%3A
Disallow: /wiki/User_talk:
Disallow: /wiki/User_talk%3A
Disallow: /wiki/WP:A
Disallow: /wiki/WP%3AA
Disallow: /wiki/Wikipedia:Administratorer
Disallow: /wiki/Wikipedia%3AAdministratorer
Disallow: /wiki/Wikipedia-diskusjon:Administratorer
Disallow: /wiki/Wikipedia-diskusjon%3AAdministratorer
Disallow: /wiki/Wikipedia_talk:Administratorer
Disallow: /wiki/Wikipedia_talk%3AAdministratorer
Disallow: /wiki/WP:S
Disallow: /wiki/WP%3AS
Disallow: /wiki/Wikipedia:Sletting
Disallow: /wiki/Wikipedia%3ASletting
Disallow: /wiki/Wikipedia-diskusjon:Sletting
Disallow: /wiki/Wikipedia-diskusjon%3ASletting
Disallow: /wiki/Wikipedia_talk:Sletting
Disallow: /wiki/Wikipedia_talk%3ASletting

This adds english namespaces in combination with norwegian names, shortcuts that otherwise bypass the robots exclusion rules and give explicit names for the search page. A few special pages that now uses $wgOut->setRobotpolicy( 'noindex,nofollow' ) should insted use $wgOut->setRobotpolicy( 'noindex' ). This should be done on at least Special:Newpages, and the page should be set up with a longer list of articles when it is hit by a search engine, or a special page optimized as a search engine index could be made. It is probably sufficient for smaller projects to keep the pages as is, while medium sized projects could modify Special:Newpages. Very large projects could possibly create so many new pages between each hit that a special solution could be necessary.

It should be fairly safe to let the crawlers follow links, while at the same time not allowing the indexers to use the pages. Many crawlers will then mark the pages as especially interesting pages, and will check back regularly.

vacuum wrote:

The segment

Disallow: /wiki/User:
Disallow: /wiki/User%3A
Disallow: /wiki/User_talk:
Disallow: /wiki/User_talk%3A

should be skipped as the file are common to all projects.
It should although be checked if this can be fixed somehow.

baard wrote:

Please hold any changes till proper community support is reached.

jeluf wrote:

Please reopen when community support is reached.

Changing robots.txt is now possible locally, by editing [[MediaWiki:Robots.txt]].