Page MenuHomePhabricator

Mirror more Kiwix downloads directories
Open, NormalPublic

Description

The WMF hosts a mirror of the ZIM files we generate at Kiwix at
http://dumps.wikimedia.org/kiwix/. This is a great value for us.

Since 6 months, we have been advertising in priority "portable packages"
on our web site. They are big zip files containing (Kiwix+ZIM+fulltext
index). This is really easier to use and well appreciated by Windows and
Linux users.

Our problem is that we have a growing traffic and a important part of
the traffic is generated by these portable packages. These zip files are available at http://download.kiwix.org/portable/ and this would be great if this directory could be also be mirrored.

You can consider that this repertory is around 2 times bigger than the zim directory. Currently 239G for the "zim" directory and 329G for the "portable" directory.

We are also reorganizing the directory structure to create thematic sub-directories, so the directories which should be mirrored additionally to the current "zim/0.9/" one would be:
zim/wikipedia
zim/wikisource
zim/wiktionary
zim/wikivoyage
portable/wikipedia
portable/wikisource
portable/wiktionary
portable/wikivoyage


Version: unspecified
Severity: enhancement

Details

Reference
bz55503

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 2:19 AM
bzimport set Reference to bz55503.
Kelson created this task.Oct 9 2013, 8:47 AM
Kelson added a comment.Dec 3 2013, 9:33 AM

No feedback on this. This is not really urgent, but would that be a least possible to mirror also "zim/wikipedia" in addition to "zim/0.9/"? New wikipedia ZIM files are stored there instead of in "zim/0.9" (which is still necessary for legacy purposes).

Ariel: Could you answer comment 1?

@Ariel, it would be really great if the rsync conf. would be adapted now. More an more ZIM files are in the new hierarchy.

so after a short chat on irc with Kelson, it turns out that right now they are looking at about 2.5TB which we don't have spare. This is a fine time to get more storage in any case, as we have been close to the edge on dataset1001 for some time now. I'll be adding a ticket for that shortly.

In the meantime I've updated the kiwix rsync job to pull from 'wikipedia' instead of the obsolete '0.9' directory for now.

This was held up due to directory permissions but is running now.

New mirrored files are now available in the mirror manager, see for example:
http://download.kiwix.org/zim/wikipedia/wikipedia_ar_all_2015-01.zim.mirrorlist

excerpt from discussion on irc:

(06:01:59 μμ) Kelson: andre: apergos: we currently working on a solution (with wmflabs) to create, each month, new version of all our project. This should be ready in the next month. So next step is to prepare a nice page summarize everything (for example @http://dumps.wikimedia.org/kiwix/)
(06:02:27 μμ) Kelson: andre
: apergos : but to do that we need to have the snapshots available
(06:03:48 μμ) apergos: ok. I just need an estimate of the total space, so I can make sure we have or can get capacity in a timely fashion
(06:05:12 μμ) apergos: how many snapshots would you want us to keep?
(06:05:17 μμ) Kelson: apergos: it's moving but we talk about ~2.5 TB
(06:06:54 μμ) Kelson: apergos: I don't plan to keep trace of old snapshots - so don't keep more than one "old" snapshot to let the started downloads finish correctly and then delete them
(06:07:57 μμ) apergos: well we won't get 3t of more storage, we'll get a chunk, if I have anything to say about it that is
(06:08:35 μμ) Kelson: apergos: for now, http://dumps.wikimedia.org/kiwix is a "slave" of http://download.kiwix.org... but if everything works well... we might think to change the way it works. The master being then download.wikimedia.org

https://phabricator.wikimedia.org/T93118 not a blocker but this is needed for the final configuration; for short-term we can proxy through dataset1001 to the kiwix dump creation box

Are T91853: Hardware for HTML / zim dumps and T93113: deploy francium for html/zim dumps actually related? They don't mention giving 3 TB for Kiwix ZIM files. If there is no space for mirroring any further directory, this seems blocked on T93118#1149493

The current approach is to share francium as a processing AND storage solution. I have a few concerns about the ability of francium RAID system to be able to deliver correct performances for both at the same time, but let try it like this and see then if this need any improvement.

The current approach is mean to be temporary (serving off of francium).

Restricted Application added subscribers: Matanya, Aklapper. · View Herald TranscriptSep 19 2015, 2:36 PM
ArielGlenn raised the priority of this task from Low to Normal.Sep 29 2015, 11:10 AM
ArielGlenn set Security to None.

Since we have the new array in place for some time now, let's revisit this and see how much more we can serve from WMF servers.

Kelson added a comment.Oct 6 2015, 7:17 PM

@Ariel
Great, please let me know if you need something from my side.