Page MenuHomePhabricator

download_dump.py: Use response.iter_content
Closed, ResolvedPublic

Description

Pywikibot is a Python-based framework to write bots for MediaWiki (more information).

Thanks to work in Google Code-in, Pywikibot now has a script called download_dump.py. It downloads a Wikimedia database dump from http://dumps.wikimedia.org/, and places the dump in a predictable directory for semi-automated use by other scripts and tests.

As @zhuyifei1999 wrote in https://gerrit.wikimedia.org/r/#/c/399179/14/scripts/maintenance/download_dump.py@84 , the script should use response.iter_content instead of response.raw. Also, it should use stream=True when fetching the content.

Reference: https://github.com/wikimedia/pywikibot/blob/master/pywikibot/page.py#L2686-L2691

You are expected to provide a patch in Wikimedia Gerrit. See https://www.mediawiki.org/wiki/Gerrit/Tutorial for how to set up Git and Gerrit.

Event Timeline

Framawiki triaged this task as Medium priority.Dec 24 2017, 4:48 PM
Framawiki created this task.

Change 400205 had a related patch set uploaded (by Rafidaslam; owner: rafid):
[pywikibot/core@master] download_dump: Use response.iter_content

https://gerrit.wikimedia.org/r/400205

Submitted the patch, suggestions are welcome, I'm a bit doubt about the chunk size though.. We can make it a constant for convenience I think

We can make it a constant for convenience I think

Yeah, it doesn't matter for most of the cases. When doing file copying/moving it's usually set to the block size of the filesystem, but for downloading I don't know of a convention, as long as it's not too small (like smaller than a KiB) or too large (like hundreds of MiB).

Change 400205 merged by jenkins-bot:
[pywikibot/core@master] download_dump: Use response.iter_content

https://gerrit.wikimedia.org/r/400205