Page MenuHomePhabricator

[Story] Improve WikibaseQualityExternalValidation dump download scripts
Closed, DeclinedPublic

Description

Currently, the scripts are painful and slow to use to download and convert the gnd data.

The script downloads and processed 3 different files, in sequential order. If the third download fails or I have to cancel because I want to stop it and continue later (because it takes so long), then the only choice is to restart the entire script (e.g. re-download and process the first and second thing).

It would be nice if it I could have it not re-download stuff and be able to continue with the third step, without repeating 1 and 2.

It would also be nice if the script could be made to run faster.

Finally, in my last attempt just now, i encounted a DownloadError timeout for the third dump file, and then the script dies. (so i have to restart the whole process and somehow increase the timeout)

PS - also nice if these were not maintained on github (https://github.com/WikidataQuality/DumpConverter)

I consider this a blocker for deployment, as I struggling a bit to be able to produce any csvs, which we would need for Wikidata.

Event Timeline

aude raised the priority of this task from to Medium.
aude updated the task description. (Show Details)
aude added a subscriber: aude.
JanZerebecki renamed this task from Improve WikibaseQualityExternalValidation dump download scripts to [Story] Improve WikibaseQualityExternalValidation dump download scripts.Sep 18 2015, 12:47 PM
JanZerebecki set Security to None.
JanZerebecki moved this task from incoming to needs discussion or investigation on the Wikidata board.