Page MenuHomePhabricator

download_dump.py: Support for "date specified" dumps
Closed, ResolvedPublic

Description

Currently the download_dump.py script just can download the file from latest revision (f.ex https://dumps.wikimedia.org/idwiki/latest/ )[1]. We can add support to this script to download from "date specified" revision f.ex https://dumps.wikimedia.org/idwiki/20171001/.

For the implementation, we can perhaps add a new parameter -revision to the script, and then if the user don't specify the -revision we will assume that the revision is latest.

^1. https://github.com/wikimedia/pywikibot/blob/ca7c0ce89f2b2e96ebc5bb7b5b8aef2ccd04c2c3/scripts/maintenance/download_dump.py#L65

Event Timeline

I think if the user don't specify the -revision we will assume that the revision is latest and find out the latest date from the site. See T183667#3864150 for details.

Change 401091 had a related patch set uploaded (by Rafidaslam; owner: rafid):
[pywikibot/core@master] download_dump: Add -revision parameter

https://gerrit.wikimedia.org/r/401091

Change 401091 merged by jenkins-bot:
[pywikibot/core@master] download_dump: Add -revision parameter

https://gerrit.wikimedia.org/r/401091

zhuyifei1999 subscribed.

I just realized, does this not work in the case of toolforge?

Hmm, I also a bit doubt of that, since I can't test it in toolforge. But I'll improve the script again based on my comment at https://phabricator.wikimedia.org/T183667#3865397 , I think that will work well on the toolforge

Change 401377 had a related patch set uploaded (by Rafidaslam; owner: rafid):
[pywikibot/core@master] download_dump: Resolve latest revision pointer to a date revision

https://gerrit.wikimedia.org/r/401377

Hmm, I also a bit doubt of that, since I can't test it in toolforge.

Why cant you test it in toolforce?
If you dont have an account, you should create one.

Why cant you test it in toolforce?
If you dont have an account, you should create one.

Oh, just know that I can create a toolforge account. I thought it was an invite only, I'll submit a membership request then.

rafidaslam closed this task as Resolved.EditedOct 28 2020, 2:58 AM

I think this ticket can be marked as resolved as the main issue has been solved (the feature commit has been merged long long ago)

and for the Toolforge thingy, I just checked it, and it turns out to be erroneous

rafid@tools-sgebastion-07:~/pywikibot-core$ python3 pwb.py scripts/download_dump.py -filename:pagelinks.sql.gz
Downloading dump from idwiki
Symlinking file from /public/dumps/public/idwiki/20201020/idwiki-20201020-pagelinks.sql.gz
Done! File stored as ./idwiki-latest-pagelinks.sql.gz
rafid@tools-sgebastion-07:~/pywikibot-core$ python3 pwb.py scripts/download_dump.py -filename:pagelinks.sql.gz -dumpdate:20200601
Downloading dump from idwiki
Symlinking file from /public/dumps/public/idwiki/20201020/idwiki-20201020-pagelinks.sql.gz
Done! File stored as ./idwiki-20200601-pagelinks.sql.gz

I'll create another ticket for this specific issue to avoid confusion as there have been big changes to the script, even the parameter has been renamed from revision to dumpdate