Page MenuHomePhabricator

weblinkchecker.py uses a fake user-agent
Closed, ResolvedPublic

Description

weblinkchecker.py (core) contains this comment:

# we fake being Firefox because some webservers block unknown
# clients, e.g. https://images.google.de/images?q=Albit gives a 403
# when using the PyWikipediaBot user agent.
'User-agent': 'Mozilla/5.0 (X11; U; Linux i686; de; rv:1.8) Gecko/20051128 SUSE/1.5-0.1 Firefox/1.5',

Which was added to compat in Jan 2007 (and copied to core):
https://www.mediawiki.org/wiki/Special:Code/pywikipedia/3165

The mentioned https://images.google.de/images?q=Albit is now a HTTP 404, and the new URL https://www.google.de/search?tbm=isch&q=Albit is a HTTP 200 when retrieved using core master (requests) and 2.0 (httplib2), so the justification for this fake user agent is no longer applicable.

This is likely because the user-agent is now more 'normal', e.g. in 2.0:

$ python pwb.py shell
Welcome to the Pywikibot interactive shell!
>>> from pywikibot.comms.http import user_agent
>>> user_agent()
'shell Pywikibot/2.0rc4 (g5802) httplib2/0.9.1 Python/2.7.10.final.0'

Faking the user-agent should be an option, default disabled, or only used for servers known to be problematic.

Also the fake user-agent should be semi-auto-updating, as the user-agent in weblinkchecker.py is so old (2005) that it will likely be causing problems as browser sniffers will assume that the user agent is too old to render the page correctly, and will fall back to a junky version or redirect to a 'not supported' message.

See Also: T68102: use one library for all http requests

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 3:40 AM
bzimport set Reference to bz69204.
bzimport added a subscriber: Unknown Object (????).
jayvdb set Security to None.
jayvdb added a subscriber: MtDu.

https://codein.withgoogle.com/dashboard/tasks/6313164748619776/
@MtDu, you might want to try this one, as it should be very easy for you to code as you've built the fake user agent function.

I'll go ahead and claim this, as I built the fake user agent function. I'll do this after I finish my current task.
@jayvdb,
Thanks for making this a GCI task for me!
MtDu

@jayvdb,
Don't worry. I'll try to do as many pywikibot tasks as I can. Even after GCI ends. :)
Thanks,
MtDu

Change 264928 had a related patch set uploaded (by MtDu):
Use new get_fake_user_agent function for User-agent

https://gerrit.wikimedia.org/r/264928

Change 264928 merged by jenkins-bot:
Use new get_fake_user_agent function for User-agent

https://gerrit.wikimedia.org/r/264928