Page MenuHomePhabricator

Convert reflinks to requests
Closed, ResolvedPublic

Description

reflinks.py is one of the few scripts still using urllib2. It should be possible to achieve the same functionality using requests.

Before attempting to switch to requests, the current usage of urllib2 should be analysed to map it to requests functionality (assuming requests has the same functionality -- it might not!).
e.g. reflinks currently only fetches 1000000 bytes from each webpage. requests does have the ability to read only a small amount of a webpage. (to be investigated further)

Event Timeline

jayvdb created this task.Sep 3 2015, 1:59 AM
jayvdb claimed this task.
jayvdb raised the priority of this task from to High.
jayvdb updated the task description. (Show Details)
jayvdb added subscribers: jayvdb, pywikibot-bugs-list, Beta16.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 3 2015, 1:59 AM
jayvdb moved this task from Backlog to references on the Pywikibot-Scripts board.
Mpaa added a subscriber: Mpaa.Sep 3 2015, 5:59 PM
Mpaa removed a subscriber: Mpaa.Sep 27 2015, 11:02 AM
jayvdb updated the task description. (Show Details)Jan 15 2016, 11:51 PM

I havent looked closely, but I expect that the requests auto-decoding will not have all the same features as the existing reflinks.py decoding. more analysis needed.

jayvdb added a subscriber: MtDu.Jan 16 2016, 11:34 AM

On https://gerrit.wikimedia.org/r/#/c/264251/11/, @MtDu wrote:

I looked around and found this. https://toolbelt.readthedocs.org/en/latest/downloadutils.html But when I did that, it said no module named requests_toolbelt. I need to do this for each of the three lines I commented on, or is it different for each? I need to sleep though, so leave me a message here or on IRC. Thanks, MtDu

toolbelt is another package that uses requests, simplifying some tasks. I have quickly looked at its stream capability, and I believe that it will not simplify the reflinks code.

I am also worried that it adds another dependency, and it has a minimum version requirement of requests 2.0.1 (mentioned at https://toolbelt.readthedocs.org/en/latest/user.html)

Change 264251 had a related patch set uploaded (by MtDu):
Set user-agent in reflinks.py Convert reflinks.py to use requests module

https://gerrit.wikimedia.org/r/264251

MtDu claimed this task.Jan 17 2016, 8:44 PM

I did this while completing T113596.
Thanks,
MtDu

After the switch to requests, ftp will no longer be supported.

https://github.com/kennethreitz/requests/issues/1237

We may be able to re-gain ftp support by testing & implementing using https://github.com/Lukasa/requests-ftp

Change 264251 merged by jenkins-bot:
Set user-agent and convert reflinks.py to use requests

https://gerrit.wikimedia.org/r/264251

jayvdb closed this task as Resolved.Jan 19 2016, 5:10 AM