Page MenuHomePhabricator

Move dumps.wikimedia.org HTTP service behind CDN edge
Open, Needs TriagePublic

Description

The https://dumps.wikimedia.org/ service is a web interface for downloading various data sets produced by the Wikimedia community.

This service has been around for many years. I assume that this longevity is why this service is still using it's own public IPv4 address and local TLS termination to serve files to the world.

Event Timeline

Andrew subscribed.

Claiming not because I'm going to implement it but because I want to find this a proper home before it's forgotten.

Hey @bd808, to have the dumps folks (plural!) on the task watching, you can either tag with the project tag or add also @Hokwelum so we both get notifications.

Note that download.wikimedia.org is mapped to the text-lb lvs's and it redirects to dumps.wikimedia.org, see T107575 for the back story which I no longer remember.

Change 793525 had a related patch set uploaded (by BBlack; author: BBlack):

[operations/puppet@production] Add dumps mapping to cache_upload

https://gerrit.wikimedia.org/r/793525

Thanks for this task, that's great!
Once this is done, could those severs live with private IPs to not "waste" public ones?
I see a mention of rsync on CR793525 for example, is that a blocker? What are the other flows on those boxes?

Once this is done, could those severs live with private IPs to not "waste" public ones?
I see a mention of rsync on CR793525 for example, is that a blocker? What are the other flows on those boxes?

This is a good question. I'm actually now wondering if the rsync wokflow will be broken by this change and if we need to make some intermediate change before we can put the http workflow behind the CDN.

There are 3 content serving workflows for these boxes. Two of the three, http and rsync, are public internet facing. The third, NFS exports, faces both the WMCS tenant network and the analytics network. There are two physical hosts currently. In normal operation one handles http + rsync and the other handles nfs.

Rsync use case is documented at https://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps. The use of "dumps.wikimedia.org" as the rsync source will make pointing that hostname at dyna.wikimedia.org problematic.

Unassigning myself because I'm not actively working on this. This remains a somewhat difficult networking gray area.

Legoktm subscribed.

Unassigning myself because I'm not actively working on this.

Assuming you just forgot to hit the button, done now.

Change 793525 abandoned by BBlack:

[operations/puppet@production] Add dumps mapping to cache_upload

Reason:

Ticket's stale, and this particular commit isn't the answer

https://gerrit.wikimedia.org/r/793525