Page MenuHomePhabricator

Q1:(Need By: TBD) rack/setup/install cloudswift100[12]
Open, Stalled, MediumPublic

Description

This task will track the racking, setup, and OS installation of cloudswift100[12]

Hostname / Racking / Installation Details

Hostnames: cloudswift1001, cloudswift1002
Racking Proposal: Use WMCS dedicated 10G racks. These can be placed in any WMCS rack and co-exist with any WMCS service. However, please place each in a separate rows from one another.
Networking/Subnet/VLAN/IP: two 10G connections. Current networking not certain, possibly cloud-hosts1-eqiad for primary and another vlan (non wmcs?) for secondary connection. This will need to be determined before the systems arrive on-site. - see below networking section
Partitioning/Raid: standard, raid1-2dev
OS Distro: Bullseye

Networking Details

Discussion on purchase task T286586 denotes this is a new service, and the networking requirements are not entirely known at time of system order placement. The networking will need to be determined before the hosts arrive (approximately 20 days), so both cloud-services-team (Hardware) and netops have been added as project tags, and the relevant users subscribed at time of task creation. The discussion on the purchase task assumes this will need to use both of its 10G ports, with one likely in the cloud hosts vlan (primary port) and one likely in another vlan (unknown at this time.) This service will be consuming ceph/rbd and presenting it to the public internet.

Per host setup checklist

Each host should have its own setup checklist copied and pasted into the list below.

cloudswift1001:

  • - receive in system on procurement task T286586 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer to commit
  • - firmware update (idrac, bios, network, raid controller)
  • - operations/puppet update - this should include updates to install_server dhcp and netboot, and site.pp role(insetup) or cp systems use role(insetup::nofirm).
  • - OS installation & initital puppet run via wmf-auto-reimage or wmf-auto-reimage-host
  • - host state in netbox set to staged

cloudswift1002:

  • - receive in system on procurement task T286586 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - add mgmt dns (asset tag and hostname) and production dns entries in netbox, run cookbook sre.dns.netbox.
  • - network port setup via netbox, run homer to commit
  • - firmware update (idrac, bios, network, raid controller)
  • - operations/puppet update - this should include updates to install_server dhcp and netboot, and site.pp role(insetup) or cp systems use role(insetup::nofirm).
  • - OS installation & initital puppet run via wmf-auto-reimage or wmf-auto-reimage-host
  • - host state in netbox set to staged

Once the system(s) above have had all checkbox steps completed, this task can be resolved.

Event Timeline

RobH removed a subscriber: Ayoub.
RobH added a parent task: Unknown Object (Task).Aug 27 2021, 7:01 PM
RobH mentioned this in Unknown Object (Task).
wiki_willy renamed this task from (Need By: TBD) rack/setup/install cloudswift100[12] to Q1:(Need By: TBD) rack/setup/install cloudswift100[12].Aug 27 2021, 7:05 PM

(So row B in 10G, C8, and/or D5.)

New cloud hosts only go in cloud racks, so no row B.

(So row B in 10G, C8, and/or D5.)

New cloud hosts only go in cloud racks, so no row B.

Racking details updated to reflect:

Racking Proposal: Use WMCS dedicated 10G racks. These can be placed in any WMCS rack and co-exist with any WMCS service. However, please place each in a separate rows from one another.

@nskaggs: Who in WMCS is going to be point on these servers? I ask so we can assign them this racking task, so they can determine the vlan requirements of these hosts before they arrive on-site.
Thanks in advance!

We will need 2 NICs connected on these servers:

  • primary NIC, with a public IPv4 address, cloudswitft100X.wikimedia.org, matching the hostname.
  • secondary NIC, with a private IPv4 address from the cloud-host vlan (10.64.20.0/24 https://netbox.wikimedia.org/ipam/prefixes/105/ip-addresses/ ). Preferably connected to cloudsw switches. This will only be a service IPv4 address and therefore it doesn't require any particular FQDN.

As I'm writing this I'm realizing this request may be hard to accommodate because there might be no rack which satisfies all requirements?

  • dedicated cloud 10G rack
  • with private IPv4 address from a production subnet (cloud-host)
  • with public IPv4 address from a production subnet (so, the switch must be trunked to such public vlan)

We will be happy to go back to the design table if this is not possible.

CC'ing @ayounsi for review.

Thanks for the detail @aborrero

Looking at the setup the logical thing is to allocate the public IPs for these hosts from the 185.15.56.0/24 allocation, that cloudsw1-c8 / cloudsw1-d5 currently announce to the production CRs in eqiad.

Those cloudsw's are currently acting as dual (VRRP) gateways on Vlan1120, for subnet 185.15.56.240/29 (cloudsw's shared VIP is 185.15.56.241). That vlan/subnet currently sits in front of the cloudgw and acts as it's gateway to the internet.

From my point of view it'd be simplest to add the new hosts to this subnet / vlan, but unfortunately all the IPs from it are already in use.

So my thinking would be we either spin up a new Vlan in a similar configuration to 1120, and allocate another subnet from 185.15.56.0/24 to use on it. Or we somehow widen the existing subnet in front of the cloudgw to accommodate these new hosts (and whatever others may need public IPv4 IPs in future).

I don't have much of a preference here, there are trade-offs either way but both approaches are straightforward. One thing we need to consider is your future need for servers like this, with public IPv4, will be. Configuring VRRP on the cloudsw's in each rack eats up 3 IPs from whatever subnet you allocate, so say using a /29 doesn't leave room for many servers on it. The need to add addresses on the cloudsw's, if we go with a new Vlan, might be an argument for widening the existing Vlan1120, to conserve addresses.

Let's discuss in our meeting later. BTW the network as it stands is set up as shown here, if it's useful for reference:

https://phab.wmfusercontent.org/file/data/jwuzqg3hqopb6vigpyxf/PHID-FILE-eyxej7hkypfas4pwoas4/WMCS_network-L2_L3.png

Thanks for the time in the meeting today to discuss. From our chat and a few other things I've looked at we can say:

  • These hosts should be in the wikimediacloud.org domain rather than wikimedia.org
  • Their first interface should be the one configured in Vlan1118 (cloud-hosts1-eqiad - 10.64.20.0/24).
  • Their second interface will be the one we configure the public IP on.
  • We need to assign a new subnet from 185.15.56.0/24 to assign the IPs from.
    • We should be careful to assign a large enough subnet to cover as many similar servers as expected medium term.
    • If you can advise on how much space is needed we can assign in Netbox, it looks like there is at least a /27 in the range free:
  • The new subnet will be statically routed on the eqiad cloudsw's to the VRRP VIP of the cloudgw's (185.15.56.244)
    • The cloudgw will effectively act as a "router on a stick"
    • I.e. traffic for the new subnet will route into the cloudgw on ens2f1np1.1120
    • If cloudgw permits it'll send it back out on ens2f1np1.XXXX towards cloudswift1001/2
  • We need a new Vlan on the cloudsw's (XXXX) to carry this traffic from cloudgw to the cloudswift hosts
  • Cloudsw will not have any IP address on this new subnet, just the static route and the new, layer-2 only Vlan.

On the CloudGW hosts I assume you'd:

  • Make ens2f1np1.XXXX part of vrf-cloudgw (as it arrives on that VRF)
  • Run VRRP between the ens2f1np1.XXXX interfaces on each, to provide a redundant VIP for cloudswift to send return traffic

That is obviously up to you, I'm sure there are other options.

The other details we spoke about, i.e. the particulars of standing up the cloudswift hosts, and what way to set them up in Netbox etc., I will discuss with other members of Infra Foundations. I assume a similar approach as used for the cloudgw hosts themselves can be followed, but I need to dig into it.

Pretty much agree with everything you commented @cmooney

Just a couple of clarifications:

  • the servers primary hostname would be cloudswift100X.eqiad.wmnet, with IPv4 addressing from vlan 1118 (cloud-hosts1-eqiad - 10.64.20.0/24)
    • we would use this hostname/IPv4 for DHCP/install/puppet/ssh/etc as with any other server in the production realm.
    • this is a 10G NIC connected to cloudsw switches, vlan 1118 untagged.

With the information in the point above, these servers can be racked and installed already.

  • the servers will have a SECONDARY interface with public IPv4 address from the cloud pool as you mentioned, new vlan etc
    • that public IPv4 address will have a FQDN like cloudswift100X.openstack.eqiad1.wikimediacloud.org or similar. It will be used only for serving the openstack APIs.
    • this is a 10G NIC connected to cloudsw switches, vlan XXXX (to be created) untagged.
    • this secondary address will be routed, as you mention, using cloudgw as l3 gateway. We can take care of that configuration via puppet (example here)

This later point above shouldn't be a blocker for racking and installing the server. It can be done as part of putting this into service.

Ok great @aborrero thanks for clarifying. That all 100% fits what I had in mind, so we are on the same page.

I'll discuss with Arzhel next week to make sure he's ok with the plan, then we can assign the new Vlan/subnet and take it from there.

@cmooney These host have come in and racked unless something has changed and these racks are correct please assign to @Cmjohnson

cloudswift1001 Rack,C8 U35. port {cloudsw2-c8-eqiad} 5,18 cableid. 11059 / 11061
cloudswift1002 Rack,D5 U33 port {cloudsw1-d5-eqiad} 4, 5 cableid 11060 / 11062

@aborrero is it possible to have more information on this new service? Design doc or similar. I can't find anything on Wikitech.

I want to make sure we don't get into a XY problem as well as document why we configured its network that way for future references.

Ideally an high level overview of what it does, who/what will interact with it, bandwidth needs, how it will scale, etc.

@aborrero is it possible to have more information on this new service? Design doc or similar. I can't find anything on Wikitech.

I want to make sure we don't get into a XY problem as well as document why we configured its network that way for future references.

Ideally an high level overview of what it does, who/what will interact with it, bandwidth needs, how it will scale, etc.

Find more information here: https://wikitech.wikimedia.org/wiki/Wikimedia_Cloud_Services_team/EnhancementProposals/cloudswift

I just created that and I'm still iterating over it, but you can see most relevant information already.

Thanks for the doc, some follow up questions to make sure I understand it properly.

However, like the openstack APIs that live on cloudcontrol servers, we would like to expose the swift API to Cloud VPS VMs (and eventually the internet at large)

As this is the points that drives the necessity of a public IP (and exposes a new service on the Internet, with all the risks it comes with), it would be useful to detail why the internet at large needs access to this endpoint.

As they're 2 servers, how will HA be managed between the two? Active/passive? will they share a VIP?

Before deploying a new vlan, could existing and well tested tools be leveraged?

For example having the servers in the 172.16/12 space and their public IP NATed to 185.15.56.0/25 ? Possibly through Neutron.

Or could the LVS front the public VIP (in our regular VIPs pool), forwarding traffic to the cloudswift interface on the cloud-host vlan?

I take it the main concern here is allocating a public IPv4 address, which is a scarce resource, no?
It seems we have a reserved block 185.15.56.128/26 (https://netbox.wikimedia.org/ipam/prefixes/3/). Couldn't we just subnet that one and perhaps allocate 185.15.56.128/28 for this new vlan?

Anyway, replies inline. Feel free to copy/paste from here to https://wikitech.wikimedia.org/wiki/Wikimedia_Cloud_Services_team/EnhancementProposals/cloudswift if you need the information at hand for future reference.

Thanks for the doc, some follow up questions to make sure I understand it properly.

However, like the openstack APIs that live on cloudcontrol servers, we would like to expose the swift API to Cloud VPS VMs (and eventually the internet at large)

As this is the points that drives the necessity of a public IP (and exposes a new service on the Internet, with all the risks it comes with), it would be useful to detail why the internet at large needs access to this endpoint.

One of the main use cases why people use our cloud is to host internet-facing web services and other tools (like APIs and such).
Our plan is to use swift to store static object blobs. Some of the things that are usually stored that way are website static assets, like images, css files, fonts, javascript libraries, etc.

We have plenty of tools and utilities that could benefit from the swift API today, to name a few:

Having the API open to the internet is one of the main uses cases of swift. Objects stored in swift can be directly requested using the API, something like this:

https://swift.openstack.eqiad1.wikimediacloud.org/v1/{account}/{container}/{object}

for example:

https://swift.openstack.eqiad1.wikimediacloud.org/v1/12345678912345/images/flowers/rose.jpg

More information on how the swift API works can be seen:

Having this functionality enabled in our cloud is almost a mandatory step to evolve and move our services in the direction of modern technology / cloud offerings.

We are, in general, aware of the risks associated with running internet-facing services. I say this in the context of our plan, which is to work on iterations:

  • work first to introduce swift to cloud-internal only clients (i,e, firewall the access from the internet on the initial deployment)
  • gain understanding of the service, how to operate it, etc
  • open it to the public internet

As they're 2 servers, how will HA be managed between the two? Active/passive? will they share a VIP?

Our initial approach will be a simple active/passive approach.

We could have a keepalived/VRRP and only consume 1 public IPv4 address (plus another in the gateway, which would be the cloudgw router).

Before deploying a new vlan, could existing and well tested tools be leveraged?

For example having the servers in the 172.16/12 space and their public IP NATed to 185.15.56.0/25 ? Possibly through Neutron.

This is something that can definitely be done. However, that short sentence hides many hidden traps that we want to avoid for now.

Or could the LVS front the public VIP (in our regular VIPs pool), forwarding traffic to the cloudswift interface on the cloud-host vlan?

Not an option we want to consider at this point.

We may need un-NATed traffic from cloud VMs to the API endpoints. Using LVS as you describe would be on violation of the cross-realm traffic guidelines.

Moreover, we have been told this is undesirable several times:

  • purpose separation: general LVS would ideally not be used to host cloud-dedicated resources.
  • reputation: running cloud-dedicated resources on IPv4 pools that are dedicated to host the wikis is undesirable.
  • DNS separation: associating cloud-dedicated services with the .wikimedia.org domain is undesirable. We should use ....wikimediacloud.org instead.

We don't have established/format policies for the last 3 bits. Shall we work on having them? :-P

Thanks!

I take it the main concern here is allocating a public IPv4 address, which is a scarce resource, no?

That's one of them, but not the most important one to me.

We currently have well maintained and well tested tool and processes to expose services internally and externally. They offers some combinations of security, monitoring, HA, and have staff experienced with them.

Introducing a "new way" of exposing a service means having to re-implement some of those mechanisms, as well as maintaining them for a long time.
Which means increasing our attack surface as well as SRE workload.

My questions are to make sure we studied all the existing ways, and deemed (+documented) them non-suitable, before we introduce a new one.
And then if a new one is needed, gather (and document) all the short/medium/long term goals/design to make sure it's as future proof as possible (thus my questions about HA for example).

I hope that clarifies my thought-process.

Our initial approach will be a simple active/passive approach.

What's the ideal end state?

However, that short sentence hides many hidden traps that we want to avoid for now.

Could we detail/document them?
"For now" means this might change?

We may need un-NATed traffic from cloud VMs to the API endpoints.

Why un-NATed is needed?
I'd worry that allowing un-NATed traffic put us back in a spot with a too-tight integration between VMs and hosts outside the cloud-instance vlan. Is that a risk?

Not an option we want to consider at this point.

I agree overall but want us to be explicit on why we shouldn't go that way. as you did. Thanks!

Which means increasing our attack surface as well as SRE workload.

Sorry, I'm having problems connecting the dots here.

  • On the server management side (control plane), nothing will change (i.e, icinga, install servers, DNS, puppet, netbox, etc). This is just another server, and I don't see how this increases attack surface or SRE workload.
  • On the public service side (data plane) the request here is to allocate a new vlan/subnet from the an IPv4 range/pool that has traditionally been used by WMCS (185.15.x.x). The new vlan/subnet will be "cloud realm", which has traditionally been managed by WMCS.

Other than the act of documenting the setup (already done), creating the netbox allocations and setting up the network bits (i.e, setting up cloudsw devices), I don't see how this generates new attack surfaces or workload for the SRE teams.
Well, creating one vlan on cloudsw and maintaining it is definitely a workload, but it shouldn't be a big deal, no?

To be clear: exposing this service (data plane) will be managed and maintained by WMCS. We won't be using any facility under SRE responsibility other than the edge routing (cloudsw, core routers..), which need little changes...

Could you please elaborate on the new attack surfaces and the SRE workloads that concerns you?

I'll try and sum up what my thought process on this was.

Firstly the security consideration is that we will have cloudswift servers connected to the cloud-hosts1-eqiad vlan on one interface, and directly to the public internet (via cloudgw) on another. That situation means that we are reliant on the admins of the cloudswift and cloudgw servers (both wmcs) to properly take care of security and network isolation.

To my mind this does not change the security assessment versus what's already in place. Cloudgw1001, for instance, has a leg in cloud-hosts1-eqiad, and another on a publicly routable network, cloud-instance-transport1-b-eqiad. So the same dependency to properly segment and isolate the networks already exists, and lies with the same team (wmcs).

In the existing case isolation is done on that host using the Linux VRF / l3mdev mechanism. On the new cloudswift devices I presume IP forwarding will be disabled which should largely take care of it. Both need proper firewalling and hardening to prevent malicious connections from the internet of course.

@ayounsi if I've missed something here please advise. But my overall thinking was that making these changes would not introduce any new security consideration. SRE already trust WMCS to manage hosts connected to both the cloud-hosts vlan and the public internet.

Apart from the security side of things, the guidance in SRE has been to treat WMCS as a "separate entity", something like a hosted customer. Perhaps a hosted customer that is our friend and we trust, but you get the idea. So whether the routing plan for cloudswift is the most optimal, or if it could be done with 1 NIC rather than 2 or whatever, is not really something I felt the need to comment on. That's a matter for WMCS. As long as we were not reducing the overall network security I figured it was up to them.

@cmooney, I agree with your take on the security aspect.

We're not in a typical service provider (ISP)/customer relationship, where the customer does whatever they want.
We need to work together (SRE/WMCS) to figure out what's the best approach in term of networking for any new service on the WMF network. Even more so if it's publicly reachable, even more so if it's out of our standard practices.

My questions are to understand and more broadly document how this new service will work, so we can identify security, scalability, maintainability, and overall design pitfalls ahead of time. For example not have to redesign the service's networking in a few months/years, as well as being able to recall in the future why it has been designed this way.

We've made good progress since the task creation, from no documentation to a draft doc. My questions in the previous comments are to address what I think are still blind spots.

From a certain point of view what we're doing here is validating case 4 in the cross-realm traffic guidelines. Part of the goal of the document was to clarify the architecture on a wide scope, to reduce this kind of friction per-project/per-server/per-idea etc.

We have an incoming network sync meeting WMCS/SRE-IF on 2021-11-24. I propose we make this topic the main agenda point of that meeting.

aborrero changed the task status from Open to Stalled.Dec 14 2021, 5:57 PM

FYI network details for these servers are blocked on T296411: cloud: decide on general idea for having cloud-dedicated hardware provide service in the cloud realm & the internet, which is in turn stalled, so marking this one the same.

What is the status on this one? it has been sitting for a while

@Jclark-ctr These are blocked on a variety of tech decisions; no action needed in the DC for now. Thanks for checking in!