Page MenuHomePhabricator

Read-only access to Wikimedia mirror of Kiwix data in dumps.wikimedia.org/kiwix/
Open, Needs TriagePublic

Description

Hello,

I'm a new comer in Kiwix team and this is my first request to Wikimedia team, so please forgive me for any errors or imprecisions I might make, this is part of my learning curve. Of course this ticket is more an idea of @Rgaudin than a pure emanation of my brain.

Wikimedia is serving a mirror of Kiwix data at https://dumps.wikimedia.org/kiwix/

Currently we only have an HTTP access to the mirror (the regular one any user is using) and this is causing us some issues in terms of mirror management at Kiwix side. On our side, we need to regularly scan mirrors to check which files are present on every mirror, and know which files can be served to our users from which mirror. Since this can only be done through HTTP, it means that we have to issue lots of HTTP requests to update mirror status, and this is consuming time and resources on our side, and probably also on Wikimedia side (even if probably less negligible). Since we want up-to-date information, we would like to be able to scan mirrors every hours but it is becoming more and more difficult with our growing dataset.

We would like to request a read-only RSYNC access (or FTP, but RSYNC is prefered) to the mirror content, to be used as a replacement of HTTP only by us and only for our mirror management operations. This would not be served to the end-users. We are open to other security requirements you might request to grant us this access, even if of course we have our own limitations imposed by our software stack.

Thank you in advance