download_dump.py: Use response.iter_content
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Framawiki
	Dec 24 2017, 4:48 PM

Description

Pywikibot is a Python-based framework to write bots for MediaWiki (more information).

Thanks to work in Google Code-in, Pywikibot now has a script called download_dump.py. It downloads a Wikimedia database dump from http://dumps.wikimedia.org/, and places the dump in a predictable directory for semi-automated use by other scripts and tests.

As @zhuyifei1999 wrote in https://gerrit.wikimedia.org/r/#/c/399179/14/scripts/maintenance/download_dump.py@84 , the script should use response.iter_content instead of response.raw. Also, it should use stream=True when fetching the content.

Reference: https://github.com/wikimedia/pywikibot/blob/master/pywikibot/page.py#L2686-L2691

You are expected to provide a patch in Wikimedia Gerrit. See https://www.mediawiki.org/wiki/Gerrit/Tutorial for how to set up Git and Gerrit.

Details

	Subject	Repo	Branch	Lines +/-
	download_dump: Use response.iter_content	pywikibot/core	master	+3 -4

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Open	None	T183663 Improve the maintenance script that download Wikimedia database dump
Resolved	Ryan10145	T183664 download_dump.py: Add a progress bar
Resolved	rafidaslam	T183666 download_dump.py: Use response.iter_content

Event Timeline

Framawiki triaged this task as Medium priority.Dec 24 2017, 4:48 PM

Framawiki created this task.

Framawiki added projects: Pywikibot, Pywikibot-Scripts.Dec 24 2017, 4:58 PM

Framawiki added a subscriber: eflyjason.

Restricted Application added a subscriber: pywikibot-bugs-list. · View Herald TranscriptDec 24 2017, 4:58 PM

Aklapper updated the task description. (Show Details)Dec 24 2017, 6:41 PM

eflyjason mentioned this in T183664: download_dump.py: Add a progress bar.Dec 25 2017, 12:39 AM

eflyjason added a subtask: T183664: download_dump.py: Add a progress bar.Dec 25 2017, 12:56 AM

eflyjason updated the task description. (Show Details)Dec 25 2017, 1:04 AM

Imported and published as https://codein.withgoogle.com/tasks/6224337721884672/

Claimed on GCI

Change 400205 had a related patch set uploaded (by Rafidaslam; owner: rafid):
[pywikibot/core@master] download_dump: Use response.iter_content

https://gerrit.wikimedia.org/r/400205

gerritbot added a project: Patch-For-Review.Dec 26 2017, 4:43 AM

Submitted the patch, suggestions are welcome, I'm a bit doubt about the chunk size though.. We can make it a constant for convenience I think

eflyjason removed a subtask: T183664: download_dump.py: Add a progress bar.Dec 26 2017, 9:14 AM

eflyjason added a parent task: T183664: download_dump.py: Add a progress bar.

We can make it a constant for convenience I think

Yeah, it doesn't matter for most of the cases. When doing file copying/moving it's usually set to the block size of the filesystem, but for downloading I don't know of a convention, as long as it's not too small (like smaller than a KiB) or too large (like hundreds of MiB).

eflyjason closed this task as Resolved.Dec 27 2017, 12:47 AM

Change 400205 merged by jenkins-bot:
[pywikibot/core@master] download_dump: Use response.iter_content