Page MenuHomePhabricator

Support 'unmanaged' projects in cloud-vps
Closed, ResolvedPublic

Description

All VMs in cloud-vps currently include

  • Puppet
  • ldap/auth integration
  • cumin client
  • light admin supervision/support

Some users (especially internal WMF users) would like a more generic-public-cloud option. For example:

  • bring-your-own base image
  • custom userdata injection
  • ssh key injection
  • no puppet, no ldap, no cumin

It should be fairly easy to support special unmanaged projects with those features. The flip side is that there will be slightly greater risks involved as these projects can/will run unsupported OS versions, unsupported software, and if they gobble up CPU cycles we won't have any recourse. For supporting these projects I propose that projectadmins in such projects be limited to NDA users.

Details

Related Changes in Gerrit:
SubjectRepoBranchLines +/-
operations/puppetproduction+5 -1
operations/puppetproduction+1 -1
operations/puppetproduction+1 -1
operations/puppetproduction+2 -2
openstack/horizon/horizon2023.1+4 -0
openstack/horizon/horizon2023.1+2 -8
operations/puppetproduction+2 -2
operations/puppetproduction+1 -1
openstack/horizon/horizon2023.1+1 -1
operations/puppetproduction+9 -4
operations/puppetproduction+1 -1
operations/puppetproduction+3 -1
operations/puppetproduction+1 -1
operations/puppetproduction+1 -1
operations/puppetproduction+6 -3
operations/puppetproduction+63 -44
operations/puppetproduction+2 -2
Show related patches Customize query in gerrit

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Notes:

'no puppet, no ldap, no cumin' is really just 'no puppet' since puppet sets up the other things.

This can be implemented by a new keystone role (or multiple roles). Let's say it's a single role called 'puppetfree'

If a user has that role on a project, they'll encounter two UI changes:

  • They can manage glance images
  • The 'create VM' workflow will display two new panels, one for ssh key management and one for metadata management
    • The metadata panel will already include a default extra key, 'puppetfree=True'
      • TODO: can we reverse this logic so that puppetfree is the default?
      • We should do as little of this as possible in the UI so as not to break automation workflows

Implementing puppetfree VMs can be done by having the cloud-init script skip all the puppet bits based on a metadata flag.

Change 980021 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] Horizon: allow image uploading via horizon for users with glance admin

https://gerrit.wikimedia.org/r/980021

Implementing puppetfree VMs can be done by having the cloud-init script skip all the puppet bits based on a metadata flag.

Actually I think this should happen in the base image. We can set a special metadata flag when building new base images that sets up puppetized base images. The default at all other times will be for cloud-init to rely on the base image for cues about puppet or no puppet.

Change 980079 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] cloud-init: make puppet optional

https://gerrit.wikimedia.org/r/980079

Can we come up with a "support level" for these VMs and make sure it's clearly stated somewhere?

Given that they are completely unmanaged there's a lot of limitations on what we can troubleshoot/debug (essentially, besides starting up a new VM to which we have access and checking that it runs and network works, everything else happens inside the black-box VM).

Can we come up with a "support level" for these VMs and make sure it's clearly stated somewhere?

Given that they are completely unmanaged there's a lot of limitations on what we can troubleshoot/debug (essentially, besides starting up a new VM to which we have access and checking that > it runs and network works, everything else happens inside the black-box VM).

My imagined support policy would be:

  • We'll help you with OpenStack issues (e.g. troubleshoot image upload issues)
  • WMCS staff provides zero official support for VMs once they're launched
  • VMs are subject to surprise reboots and up to X minutes downtime without advance notice
  • We will shut down any VMs that are producing disruptive traffic or are suspected of security compromise and may only notify project owners after the fact

In the near term, users will need to apply to have the 'glanceadmin' role added and/or to have an unpuppetized base image added to their project.

@dcaro does that sound about right?

My imagined support policy would be:

  • We'll help you with OpenStack issues (e.g. troubleshoot image upload issues)
  • WMCS staff provides zero official support for VMs once they're launched
  • VMs are subject to surprise reboots and up to X minutes downtime without advance notice
  • We will shut down any VMs that are producing disruptive traffic or are suspected of security compromise and may only notify project owners after the fact

In the near term, users will need to apply to have the 'glanceadmin' role added and/or to have an unpuppetized base image added to their project.

@dcaro does that sound about right?

That sounds good yes 👍

Some questions arise (for us mostly, can be after starting the service):

  • How will we monitor that the images uploaded are open source only?
  • How will we monitor if the VMs are misbehaving? (besides randomly noticing something is weird while doing something else)

Some questions arise (for us mostly, can be after starting the service):

  • How will we monitor that the images uploaded are open source only?
  • How will we monitor if the VMs are misbehaving? (besides randomly noticing something is weird while doing something else)

Currently the policy for both of those things is pretty much "If we notice, we do something." I'm not sure that this case necessarily needs to be different, especially if we restrict the unmanaged feature to specific trusted users.

That said, we could monitor the total list of installed images (and assume they have honest naming) or we could maybe use cloud-init to generate reports about what images are being used to run VMs.

Change 980021 merged by Andrew Bogott:

[operations/puppet@production] Horizon: allow image uploading via horizon for users with glance admin

https://gerrit.wikimedia.org/r/980021

Change 980079 merged by Andrew Bogott:

[operations/puppet@production] cloud-init: make puppet optional

https://gerrit.wikimedia.org/r/980079

Change 981620 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] nova vendor-data: 2nd attempt to read 'install_puppet' metadata

https://gerrit.wikimedia.org/r/981620

Change 981620 merged by Andrew Bogott:

[operations/puppet@production] nova vendor-data: 2nd attempt to read 'install_puppet' metadata

https://gerrit.wikimedia.org/r/981620

Change 981625 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] nova vendor-data: 3rd attempt to read 'install_puppet' metadata

https://gerrit.wikimedia.org/r/981625

Change 981625 merged by Andrew Bogott:

[operations/puppet@production] nova vendor-data: 3rd attempt to read 'install_puppet' metadata

https://gerrit.wikimedia.org/r/981625

Change 981628 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] vendordata: only wipe out puppet certs if we aren't building a base image

https://gerrit.wikimedia.org/r/981628

Change 981628 merged by Andrew Bogott:

[operations/puppet@production] vendordata: only wipe out puppet certs if we aren't building a base image

https://gerrit.wikimedia.org/r/981628

Change 981669 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] nova vendor-data: more puppet-switching attempts

https://gerrit.wikimedia.org/r/981669

Change 981669 merged by Andrew Bogott:

[operations/puppet@production] nova vendor-data: more puppet-switching attempts

https://gerrit.wikimedia.org/r/981669

Change 981676 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] vendordata: don't specify puppet server on first run

https://gerrit.wikimedia.org/r/981676

Change 981676 merged by Andrew Bogott:

[operations/puppet@production] vendordata: don't specify puppet server on first run

https://gerrit.wikimedia.org/r/981676

Change 981677 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] nova: move the puppet cert cleanup from vendordata to wmcs-image-create

https://gerrit.wikimedia.org/r/981677

Change 981677 merged by Andrew Bogott:

[operations/puppet@production] nova: move the puppet cert cleanup from vendordata to wmcs-image-create

https://gerrit.wikimedia.org/r/981677

Change 981705 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[openstack/horizon/horizon@2023.1] WMF Hacks create-instance workflow: add a missing comma

https://gerrit.wikimedia.org/r/981705

Change 981705 merged by Andrew Bogott:

[openstack/horizon/horizon@2023.1] WMF Hacks create-instance workflow: add a missing comma

https://gerrit.wikimedia.org/r/981705

Change 981706 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] Horizon: update version in codfw1dev

https://gerrit.wikimedia.org/r/981706

Change 981706 merged by Andrew Bogott:

[operations/puppet@production] Horizon: update version in codfw1dev

https://gerrit.wikimedia.org/r/981706

Change 982111 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] Horizon: backport 598bfa3aabe9cf2c1d09f58d4a0745462e80b1bc to 'zed'

https://gerrit.wikimedia.org/r/982111

Change 982111 merged by Andrew Bogott:

[operations/puppet@production] Horizon: backport 598bfa3aabe9cf2c1d09f58d4a0745462e80b1bc to 'zed'

https://gerrit.wikimedia.org/r/982111

Change 982453 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[openstack/horizon/horizon@2023.1] launch-instance workflow: allow keypair panel for all launches

https://gerrit.wikimedia.org/r/982453

Change 982454 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[openstack/horizon/horizon@2023.1] launch-instance: add caption to keypair panel about puppet

https://gerrit.wikimedia.org/r/982454

Change 982454 merged by Andrew Bogott:

[openstack/horizon/horizon@2023.1] launch-instance: add caption to keypair panel about puppet

https://gerrit.wikimedia.org/r/982454

Change 982453 merged by Andrew Bogott:

[openstack/horizon/horizon@2023.1] launch-instance workflow: allow keypair panel for all launches

https://gerrit.wikimedia.org/r/982453

Change 982470 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] Revert "Horizon: allow image uploading via horizon for users with glance admin"

https://gerrit.wikimedia.org/r/982470

Change 982471 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] Horizon: update build version in codfw1dev

https://gerrit.wikimedia.org/r/982471

Change 982472 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] Horizon: update build version in eqiad1

https://gerrit.wikimedia.org/r/982472

Change 982470 merged by Andrew Bogott:

[operations/puppet@production] Revert "Horizon: allow image uploading via horizon for users with glance admin"

https://gerrit.wikimedia.org/r/982470

Change 982471 merged by Andrew Bogott:

[operations/puppet@production] Horizon: update build version in codfw1dev

https://gerrit.wikimedia.org/r/982471

Change 982472 merged by Andrew Bogott:

[operations/puppet@production] Horizon: update build version in eqiad1

https://gerrit.wikimedia.org/r/982472

  • bring-your-own base image

This is semi-implemented.

  • Users can be granted the 'glanceadmin' role on a project and then upload their own images.
  • Upload via CLI will work but the Horizon interface doesn't work.
  • Because this is a user role, it's a property of a particular user in a particular project, /not/ a project-wide setting. And users can't currently transfer the role to other users.
  • custom userdata injection

This is available for all VMs, puppet or no.

  • ssh key injection

This is now available and works. The UI isn't perfect as it displays for puppetized VMs as well and then doesn't do anything but there's at least a warning on the dialog.

  • no puppet, no ldap, no cumin

Raw base images can be shared with any project, at which point members of that project can launch unpuppetized VMs and access them via not-in-ldap keypairs.

I'm opening subtasks for the UI issues but I declare this to be a Minimum Viable Product!

Change 992543 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] nova policy: add awareness of 'unmanaged' role

https://gerrit.wikimedia.org/r/992543

Change 992543 merged by Andrew Bogott:

[operations/puppet@production] nova policy: add awareness of 'unmanaged' role

https://gerrit.wikimedia.org/r/992543