Page MenuHomePhabricator

Robots.txt exempt for browsershots
Closed, DeclinedPublic

Description

Hiya,

Whenever someone gets a chance, pretty please add an exempt for browsershots to robots.txt on enwiki (based on http://en.wikipedia.org/wiki/Wikipedia:Village_pump_%28technical%29/Archive_18#robots.txt_.2F_browsershots as well as http://en.wikipedia.org/wiki/User_talk:Slakr#Browsershots)

E.g.,:

User-agent: Browsershots
Disallow:

... or something similar. That would help tremendously in examining cross-compatibility between browsers and platforms.

Thanks a million, and cheers =)
--slakr@enwiki


Version: unspecified
Severity: enhancement
URL: http://en.wikipedia.org/robots.txt

Details

Reference
bz13307

Event Timeline

bzimport raised the priority of this task from to Lowest.Nov 21 2014, 10:05 PM
bzimport set Reference to bz13307.
bzimport added a subscriber: Unknown Object (MLST).

There shouldn't be anything blocking Browsershots from accessing pages; certainly nothing apparent in our robots.txt

From my previous testing, it appears that Browsershots itself is very badly configured:

  1. It first loads /robots.txt with a generic "Python-urllib" user-agent. This is blocked at the HTTP proxy level on Wikimedia sites due to past abuse, so they would be unable to load our robots.txt file.
  1. It then loads the requested page with an app-specific "Browsershots" user-agent. This is not blocked, and would be allowed with no problems.
  1. It then passes the page off to a bunch of browsers in turn.

I can only assume that the failure to read robots.txt is interpreted as a bad sign. :)

Please contact Browsershots and tell them that their bot is broken; they should load robots.txt with the same user-agent string that they load the page itself with.

  • Bug 27986 has been marked as a duplicate of this bug. ***