Page MenuHomePhabricator

Convert reflinks to requests
Closed, ResolvedPublic

Description

reflinks.py is one of the few scripts still using urllib2. It should be possible to achieve the same functionality using [[http://docs.python-requests.org/en/latest/|requests]].

Before attempting to switch to requests, the current usage of urllib2 should be analysed to map it to requests functionality (assuming requests has the same functionality -- it might not!).
e.g. reflinks currently only fetches 1000000 bytes from each webpage. requests does have the ability to read only a small amount of a webpage. (to be investigated further)

Event Timeline

jayvdb created this task.Sep 3 2015, 1:59 AM
jayvdb claimed this task.
jayvdb raised the priority of this task from to High.
jayvdb updated the task description. (Show Details)
jayvdb added subscribers: jayvdb, pywikibot-bugs-list, Beta16.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 3 2015, 1:59 AM
jayvdb moved this task from Backlog to references on the Pywikibot-Scripts board.
Mpaa added a subscriber: Mpaa.Sep 3 2015, 5:59 PM
Mpaa removed a subscriber: Mpaa.Sep 27 2015, 11:02 AM
jayvdb updated the task description. (Show Details)Jan 15 2016, 11:51 PM

I havent looked closely, but I expect that the requests auto-decoding will not have all the same features as the existing reflinks.py decoding. more analysis needed.

jayvdb added a subscriber: MtDu.Jan 16 2016, 11:34 AM

On https://gerrit.wikimedia.org/r/#/c/264251/11/, @MtDu wrote:

I looked around and found this. https://toolbelt.readthedocs.org/en/latest/downloadutils.html But when I did that, it said no module named requests_toolbelt. I need to do this for each of the three lines I commented on, or is it different for each? I need to sleep though, so leave me a message here or on IRC. Thanks, MtDu

toolbelt is another package that uses requests, simplifying some tasks. I have quickly looked at its stream capability, and I believe that it will not simplify the reflinks code.

I am also worried that it adds another dependency, and it has a minimum version requirement of requests 2.0.1 (mentioned at https://toolbelt.readthedocs.org/en/latest/user.html)

Change 264251 had a related patch set uploaded (by MtDu):
Set user-agent in reflinks.py Convert reflinks.py to use requests module

https://gerrit.wikimedia.org/r/264251

MtDu claimed this task.Jan 17 2016, 8:44 PM

I did this while completing T113596.
Thanks,
MtDu

After the switch to requests, ftp will no longer be supported.

https://github.com/kennethreitz/requests/issues/1237

We may be able to re-gain ftp support by testing & implementing using https://github.com/Lukasa/requests-ftp

Change 264251 merged by jenkins-bot:
Set user-agent and convert reflinks.py to use requests

https://gerrit.wikimedia.org/r/264251

jayvdb closed this task as Resolved.Jan 19 2016, 5:10 AM