weblinkchecker.py (core) contains this comment:
# we fake being Firefox because some webservers block unknown # clients, e.g. https://images.google.de/images?q=Albit gives a 403 # when using the PyWikipediaBot user agent. 'User-agent': 'Mozilla/5.0 (X11; U; Linux i686; de; rv:1.8) Gecko/20051128 SUSE/1.5-0.1 Firefox/1.5',
Which was added to compat in Jan 2007 (and copied to core):
https://www.mediawiki.org/wiki/Special:Code/pywikipedia/3165
The mentioned https://images.google.de/images?q=Albit is now a HTTP 404, and the new URL https://www.google.de/search?tbm=isch&q=Albit is a HTTP 200 when retrieved using core master (requests) and 2.0 (httplib2), so the justification for this fake user agent is no longer applicable.
This is likely because the user-agent is now more 'normal', e.g. in 2.0:
$ python pwb.py shell Welcome to the Pywikibot interactive shell! >>> from pywikibot.comms.http import user_agent >>> user_agent() 'shell Pywikibot/2.0rc4 (g5802) httplib2/0.9.1 Python/2.7.10.final.0'
Faking the user-agent should be an option, default disabled, or only used for servers known to be problematic.
Also the fake user-agent should be semi-auto-updating, as the user-agent in weblinkchecker.py is so old (2005) that it will likely be causing problems as browser sniffers will assume that the user agent is too old to render the page correctly, and will fall back to a junky version or redirect to a 'not supported' message.