Page MenuHomePhabricator

Provide a good download service of dumps from Wikimedia
Closed, DuplicatePublic

Description

Currently we cap the number of downloads from the same IP and we severely limit bandwidth. If these limits are loosened this negatively impacts dump production.

Fix this: add ms1001 to download server pool, get eth bonding working on that host, see if increasing memory on dataset1001 can alleviate wait i/o issues, etc.

Event Timeline

ArielGlenn raised the priority of this task from to Needs Triage.
ArielGlenn updated the task description. (Show Details)
ArielGlenn subscribed.

See also T123094 on replacing/upgrading the dataset servers, as they are out of warranty.

All hardware refresh tickets for dumps are now at T118154.

All hardware refresh tickets for dumps are now at T118154.

Does that include a request for a mirror webserver in AMS? If not, I think a proxy would still be helpful.

No, and network-wise we shouldn't need it, but this ticket will not be resolved until we know we are providing good service to everyone. If that requires more hardware or network tweaking later then we'll do that.

Is there any example of mirror for large datasets that manages to provide a good service to the whole world from a single location? If yes, we should copy them. If not, better assume we'll need a proxy/mirror in Europe as most people.

So, can we set up this proxy in Europe please? If it's too hard to do in a WMF datacentre, can you direct me to the most appropriate way to get funding from WMF ops to pay for a VPS with a decent amount of disk space?

How does http://dumps.wikimedia.your.org/ perform? I can ask them about their routing but I know all requests come to and are served from a host in the US.

How does http://dumps.wikimedia.your.org/ perform? I can ask them about their routing but I know all requests come to and are served from a host in the US.

Usually they're much faster than WMF (1-2 orders of magnitude faster): it's not uncommon to reach 10 or 20 MiB/s download from Europe.

Still, I can understand if WMF can't pay for equally good bandwidth; a local proxy seems a way cheaper solution, and it can be implemented by a single person in few hours with few thousands dollars.

This is now dependent on the bandwith caps for labstore 1006,7. There's a task for that: T191491

In the meantime labstore hosts handle this now and there is a superceding ticket about bandwidth and access for those, see T191491. Closing this as dup.