Page MenuHomePhabricator

Upload Wikipedia corpora to download.wmcloud.org
Closed, ResolvedPublic

Description

As per https://wikitech.wikimedia.org/wiki/Help:Toolforge/Web#Hosting_large_files and after discussing it on IRC, please provide the files from the following directory via download.wmcloud.org

https://drive.google.com/drive/folders/1HfL138UCqr69w0XfAhlAEUh6VVOnzwBE

This should be about 42 files with a total of about 16 GB.

Thank you!

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Change 662805 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] profile::labs::downloadserver: remove lvm class

https://gerrit.wikimedia.org/r/662805

Change 662805 merged by Andrew Bogott:
[operations/puppet@production] profile::labs::downloadserver: remove lvm class

https://gerrit.wikimedia.org/r/662805

Hello @DVrandecic! Can you advise how I might download those files via the commandline? I'm not excited about grabbing them locally on my laptop and then uploading (it's a lot of clicks!) and google drive doesn't seem to like wget.

If you want to paste a bunch of commands to run here I can do that, otherwise I can give you temporary access to the server and you can upload them directly.

Sorry, no, I don't know how to download them via the commandline. But there is a UI option to download the whole directory at once.

Screen Shot 2021-02-08 at 8.01.20 PM.png (1×1 px, 212 KB)

Or when not logged in, the "Download all" link (but I have not tried that out).

If preferred, I am happy to upload the files myself via a temporary access, that's perfectly fine as well. Happy either way.

Change 663034 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] profile::labs::downloadserver: remove lvm class

https://gerrit.wikimedia.org/r/663034

Change 663034 merged by Andrew Bogott:
[operations/puppet@production] profile::labs::downloadserver: remove lvm class

https://gerrit.wikimedia.org/r/663034

Change 663036 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] profile::labs::downloadserver: remove lvm class

https://gerrit.wikimedia.org/r/663036

Change 663036 merged by Andrew Bogott:
[operations/puppet@production] profile::labs::downloadserver: remove lvm class

https://gerrit.wikimedia.org/r/663036

Is this what you're looking for?

https://download.wmcloud.org/corpora/

let me know if you want them named something else!

Thank you, perfect!

Sorry, yes, just like that! Are the rest of the files uploading?

hm, for some reason the en file didn't show up when I did the back download. Grabbing that now -- everything else should be up already. 24 total files, right?

Closing this, feel free to reopen if you need other things.

Sorry, it should be 42 files, not 24 - languages like ar, hr, etc. are missing.

And it's because I made a mistake uploading them, I didn't realize that the upload was stalled! So sorry.

I uploaded the missing files, and now doublechecked they are actually there.

Thank you for your understanding. If you'd rather I upload them, feel free to give me temporary rights to the server.

https://drive.google.com/drive/folders/1HfL138UCqr69w0XfAhlAEUh6VVOnzwBE

Thank you, and sorry for the double work!

aborrero triaged this task as Medium priority.Feb 11 2021, 5:15 PM
aborrero moved this task from Inbox to Doing on the cloud-services-team (Kanban) board.

ok, now I see 42.