Page MenuHomePhabricator

Convert reflinks to requests
Closed, ResolvedPublic

Description

reflinks.py is one of the few scripts still using urllib2. It should be possible to achieve the same functionality using [[http://docs.python-requests.org/en/latest/|requests]].

Before attempting to switch to requests, the current usage of urllib2 should be analysed to map it to requests functionality (assuming requests has the same functionality -- it might not!).
e.g. reflinks currently only fetches 1000000 bytes from each webpage. requests does have the ability to read only a small amount of a webpage. (to be investigated further)

Event Timeline

jayvdb claimed this task.
jayvdb raised the priority of this task from to High.
jayvdb updated the task description. (Show Details)
jayvdb added subscribers: jayvdb, pywikibot-bugs-list, Beta16.

I havent looked closely, but I expect that the requests auto-decoding will not have all the same features as the existing reflinks.py decoding. more analysis needed.

On https://gerrit.wikimedia.org/r/#/c/264251/11/, @MtDu wrote:

I looked around and found this. https://toolbelt.readthedocs.org/en/latest/downloadutils.html But when I did that, it said no module named requests_toolbelt. I need to do this for each of the three lines I commented on, or is it different for each? I need to sleep though, so leave me a message here or on IRC. Thanks, MtDu

toolbelt is another package that uses requests, simplifying some tasks. I have quickly looked at its stream capability, and I believe that it will not simplify the reflinks code.

I am also worried that it adds another dependency, and it has a minimum version requirement of requests 2.0.1 (mentioned at https://toolbelt.readthedocs.org/en/latest/user.html)

Change 264251 had a related patch set uploaded (by MtDu):
Set user-agent in reflinks.py Convert reflinks.py to use requests module

https://gerrit.wikimedia.org/r/264251

I did this while completing T113596.
Thanks,
MtDu

After the switch to requests, ftp will no longer be supported.

https://github.com/kennethreitz/requests/issues/1237

We may be able to re-gain ftp support by testing & implementing using https://github.com/Lukasa/requests-ftp

Change 264251 merged by jenkins-bot:
Set user-agent and convert reflinks.py to use requests

https://gerrit.wikimedia.org/r/264251