Page MenuHomePhabricator

MediaWiki:Robots.txt and $wgCapitalLinks = true on pl.wiktionary
Closed, InvalidPublic

Description

Please add the following lines to robots.txt for pl.wiktionary:

Disallow: /wiki/Wikisłownik:Strony_do_skasowania
Disallow: /wiki/Wikis%C5%82ownik:Strony_do_skasowania
Disallow: /wiki/Wikis%C5%82ownik%3AStrony_do_skasowania
Disallow: /wiki/Wikisłownik:Bar
Disallow: /wiki/Wikis%C5%82ownik:Bar
Disallow: /wiki/Wikis%C5%82ownik%3ABar
Disallow: /wiki/Wikisłownik:Bar/
Disallow: /wiki/Wikis%C5%82ownik:Bar/
Disallow: /wiki/Wikis%C5%82ownik%3ABar/
Disallow: /wiki/Wikisłownik:Tablica ogłoszeń
Disallow: /wiki/Wikis%C5%82ownik:Tablica_og%C5%82osze%C5%84
Disallow: /wiki/Wikis%C5%82ownik%3ATablica_og%C5%82osze%C5%84
Disallow: /wiki/Wikisłownik:Tablica ogłoszeń/
Disallow: /wiki/Wikis%C5%82ownik:Tablica_og%C5%82osze%C5%84/
Disallow: /wiki/Wikis%C5%82ownik%3ATablica_og%C5%82osze%C5%84/


Version: unspecified
Severity: enhancement

Details

Reference
bz15878

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 10:18 PM
bzimport set Reference to bz15878.
bzimport added a subscriber: Unknown Object (MLST).

As far as I know you should configure this in your local MediaWiki:Robots.txt per bug 15601. If true, close as INVALID.

I have modified http://pl.wiktionary.org/wiki/MediaWiki:Robots.txt, but since there was no effect, I write here.

Changed summary to be in line with the problem description. The issue is not to add the pl.wp pages to the generic robots.txt, but it is a bug report on the robots.txt merge functionality.

mike.lifeguard+bugs wrote:

(In reply to comment #2)

I have modified http://pl.wiktionary.org/wiki/MediaWiki:Robots.txt, but since
there was no effect, I write here.

I just checked for Meta, and it seems to have no effect there either. I doubt this should still be in Site requests as there seems to be a real bug with this feature (though adding things to the global robots.txt until it's fixed will be a workaround, I think).

jeluf wrote:

Case matters. You have to edit Mediawiki:robots.txt, not Mediawiki:Robots.txt

So why do for example MediaWiki:Common.js and MediaWiki:Deletereason-dropdown (not: MediaWiki:common.js or MediaWiki:deletereason-dropdown) work?

Mediawiki:robots.txt does not appear in Special:AllMessages, so how would you find out the correct spelling?

jeluf wrote:

robots.txt handling is no MediaWiki-Feature. So there's no default message here, and thus, the page is not listed in the Special:AllMessages view before it has been created.

http://pl.wiktionary.org/w/index.php?title=MediaWiki:Common.js
http://pl.wiktionary.org/w/index.php?title=MediaWiki:common.js

are different pages. One works, the other doesn't. Wiktionaries are case sensitive.

(In reply to comment #7)

robots.txt handling is no MediaWiki-Feature. So there's no default message
here, and thus, the page is not listed in the Special:AllMessages view before
it has been created.

robots.txt is defined in the WMF specific extension WikimediaMessages and therefore it is shown in Special:AllMessages. But that is not the point.

Regardless of this it is impossible to create a message [[MediaWiki:robots.txt]] on wikis with $wgCapitalLinks = true;
which is the default for mostly all WMF wikis with exception of the Wiktionaries.

Please try to create [[de:MediaWiki:robots.txt]]. It's switches immediatly to [[de:MediaWiki:Robots.txt]].

(In reply to comment #8)

robots.txt is defined in the WMF specific extension WikimediaMessages and
therefore it is shown in Special:AllMessages. But that is not the point.

Regardless of this it is impossible to create a message
[[MediaWiki:robots.txt]] on wikis with $wgCapitalLinks = true;
which is the default for mostly all WMF wikis with exception of the
Wiktionaries.

Please try to create [[de:MediaWiki:robots.txt]]. It's switches immediatly to
[[de:MediaWiki:Robots.txt]].

The point here is that the issue was reported on plwiktionary, which has $wgCapitalLinks = false; . On wikis with $wgCapitalLinks = true; , editing [[MediaWiki:Robots.txt]] will work.

mike.lifeguard+bugs wrote:

(In reply to comment #9)

(In reply to comment #8)

robots.txt is defined in the WMF specific extension WikimediaMessages and
therefore it is shown in Special:AllMessages. But that is not the point.

Regardless of this it is impossible to create a message
[[MediaWiki:robots.txt]] on wikis with $wgCapitalLinks = true;
which is the default for mostly all WMF wikis with exception of the
Wiktionaries.

Please try to create [[de:MediaWiki:robots.txt]]. It's switches immediatly to
[[de:MediaWiki:Robots.txt]].

The point here is that the issue was reported on plwiktionary, which has
$wgCapitalLinks = false; . On wikis with $wgCapitalLinks = true; , editing
[[MediaWiki:Robots.txt]] will work.

Meta, which has $wgCapitalLinks = true; uses MediaWiki:Robots.txt, yet it doesn't seem to work.

On pages which use NOINDEX there is <meta name="robots" content="noindex,follow" />, however that is not so for pages which should have it because of MediaWiki:Robots.txt.

happy_melon wrote:

That's not how it works. There are two ways of blocking spider access to pages: when a spider first visits a site, it looks for a file called "robots.txt" in the root of the site, and follows the rules there to exclude certain tranches of pages. When it visits each individual page, it looks for the "robots" meta tag and, if one is present and tells it to go away, it does so, and 'forgets' that it was ever on the page. Modifying [[MediaWiki:Robots.txt]] appends entries to the site /robots.txt file (or is supposed to, anyway); it doesn't affect meta tags on pages.

(In reply to comment #4)

I just checked for Meta, and it seems to have no effect there either. I doubt
this should still be in Site requests as there seems to be a real bug with this
feature (though adding things to the global robots.txt until it's fixed will be
a workaround, I think).

http://es.wikipedia.org/robots.txt does work (look at the bottom)
Is it only working for wikipedias?

alexsm333 wrote:

At this moment I see the content of
http://pl.wiktionary.org/wiki/MediaWiki:robots.txt
at the end of
http://pl.wiktionary.org/robots.txt
so marking this as INVALID,
plus changing summary to reflect the solution.