Page MenuHomePhabricator

Streamline the creation of ceph storage cluster user accounts
Closed, ResolvedPublic

Description

Background

We use puppet to manage the creation of our Ceph users.

There are two classes which can be applied to hosts in order to create the users.

  • ceph::auth::load_all - this renders all keyrings to disk and optionally imports the user and/or updates the key capabilities (caps). This is generally used for user management on the ceph cluster hosts.
  • ceph::auth::deploy - this renders a given subset of the keyrings to disk, but does not import them to ceph. This is generally used for deploying certain keyrings to ceph client applications.

We have several custom types that we have defined:

When we add a user, we add the details that are not secrets to the hiera structure, for example:

profile::ceph::auth::load_all::configuration:
  mon.:
    keyring_path: /etc/ceph/ceph.mon.keyring
    import_to_ceph: false
    caps:
      mon: allow *

We than add a corresponding keydata value to the private repository, as well as 'labs/private e.g. https://gerrit.wikimedia.org/r/c/labs/private/+/1025728

profile::ceph::auth::load_all::configuration:
  mon.:
    keydata: abcdefABCDEF12345678abcdefABCDEF123456==
Problem statement

When keydata for a client exists in the private or labs/private repository, but that user has not been created in the main puppet repository, then the puppet compilation fails with an error like this:

Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Evaluation Error: Error while evaluating a Function Call, Class[Profile::Ceph::Auth::Load_all]: parameter 'configuration' entry 'dse-k8s-csi' expects a value for key 'caps' (file: /srv/puppet_code/environments/production/modules/role/manifests/ceph/server.pp, line: 23, column: 5) on node cephosd1005.eqiad.wmnet

The effect of this compilation failure is twofold:

  1. We cannot run PCC successfully against any change that includes adding a user, if the dummy keydata has been added to labs/private.
  2. When adding a user in production, we need to add the keydata immediately after merging the main puppet patch, in order to avoid failed puppet runs.

We have previously worked on this under T293752: cloud ceph: refactor rbd client puppet profiles and specifically in this patch: https://gerrit.wikimedia.org/r/c/operations/puppet/+/745768
...however, I don't think that worked fully.

I think that the reason it didn't work is because caps is not an Optional parameter in the Ceph::Auth::ClientAuth type.

Event Timeline

Change #1026867 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Make caps an optional parameter to the Ceph::Auth::ClientAuth type

https://gerrit.wikimedia.org/r/1026867

@dcaro @aborrero - I have submitted a patch for this, but it's not blocking me.
I have just noticed that every time I try to create a user, either pcc breaks or puppet runs in production break, or both.
After adding the users everywhere it's fine.

Hopefully this one word change the Ceph::Auth::ClientAuth type will fix it, but I'd be grateful if you could check it out at your convenience, please.

Change #1026867 merged by Btullis:

[operations/puppet@production] Make caps an optional parameter to the Ceph::Auth::ClientAuth type

https://gerrit.wikimedia.org/r/1026867

This has ben deployed. I will monitor to make sure that it's OK and then resolve.