Page MenuHomePhabricator

cumin lacks ssh key and does not generate authorized key on targets
Closed, ResolvedPublic

Description

When provisioning a Cumin instance on WMCS, the ssh key pair is not provisioned from labs/private and change to the key pair or hostname is not reflected in the /etc/ssh/userkeys/root.d of the targets.

Context

As part of T360784, I went to rebuild Cumin master with hostname integration-cumin.integration.eqiad1.wikimedia.cloud. In Horizon, the Puppet configuration has:

Puppet Classes:
profile::openstack::eqiad1::cumin::master

Hiera Config
profile::openstack::eqiad1::cumin::aliases: {}
profile::openstack::eqiad1::cumin::project_ssh_priv_key_path: /root/.ssh/cumin
profile::openstack::eqiad1::region: eqiad1-r

When running the Puppet agent I get:

Error: /Stage[main]/Profile::Openstack::Eqiad1::Cumin::Master/Keyholder::Agent[cumin_openstack_integration_master]/File[/etc/keyholder.d/cumin_openstack_integration_master]: Could not evaluate: Could not retrieve information from environment production source(s) file:///root/.ssh/cumin
Error: /Stage[main]/Profile::Openstack::Eqiad1::Cumin::Master/Keyholder::Agent[cumin_openstack_integration_master]/File[/etc/keyholder.d/cumin_openstack_integration_master.pub]: Could not evaluate: Could not retrieve information from environment production source(s) file:///root/.ssh/cumin.pub

Issue 1: key pair is not provisioned

The Keyholder provided by profile::openstack::eqiad1::cumin::master can not find /root/.ssh/cumin since there is nothing provisioning it. I don't know how it worked previously (the instance was created in December 2019), my suspicion is that the files were added in place instead of using a secret().

The keys are on the Puppet server in labs/private:

/srv/git/labs/private/files/ssh/cumin.pub
/srv/git/labs/private/files/ssh/cumin

I went to copy the keypair from the Puppet server to the Cumin instance (and fixing the ownership and permission). I would expect Puppet to do that step for us, the profile could default to setup the secrets using a convention such as files/ssh/cumin.<WMCS project>.

Issue 2: target authorized key is not refreshed

Once the above manually fixed, I have tried Cumin using: sudo cumin --force 'name:docker' 'hostname' but the key is rejected by all targets and I get Permission denied (publickey).:

/etc/ssh/userkeys/root.d
# Cumin Masters.
from="172.16.4.160,172.16.2.249,172.16.1.220",no-agent-forwarding,no-port-forwarding,no-x11-forwarding,no-user-rc ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAICcav+ECiF6hW2XRuP7R8nqDw4hPlD0OChsGvB6K27jK root@cloudinfra-internal-puppetmaster-02

from="172.16.1.230",no-agent-forwarding,no-port-forwarding,no-x11-forwarding,no-user-rc ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIPPfdabE1Fej0X86QgjY72LXvA3Wawrg0ZcDL0PF56/A root@integration-cumin

That last entry has IP 172.16.1.230 which is the IP of the old instance. It now has the IP 172.16.6.123. It looks like Puppet did not notice the integration-cumin has changed its IP address.

Event Timeline

hashar renamed this task from cumin lack ssh key and does not generate authorized key on targets to cumin lacks ssh key and does not generate authorized key on targets.Jul 16 2024, 10:36 AM

I don't know how it worked previously

@hashar see the related documentation https://wikitech.wikimedia.org/wiki/Help:Cumin_master (also linked from the main Cumin page on wikitech)

Thanks! The guide and the Puppet manifest expect the key to be generated on the Cumin master host. When reimaging the host, the generated keys are not there anymore. I guess I'd prefer to have them stored as secrets and provided by Puppet but that requires a private puppet server. That explains the first issue.

hashar claimed this task.

For the second issue, /etc/ssh/userkeys/root.d/cumin not being updated, it is generated from an erb template that has:

from="<%= @project_masters_str %>",no-agent-forwarding,no-port-forwarding,no-x11-forwarding,no-user-rc <%= @project_pub_key %>

And the project wide hiera config has:

profile::openstack::eqiad1::cumin::project_masters:
- 172.16.1.230

I have updated the IP, ran Puppet on one of the target. I then remarked the keyholder and surely it works now:

hashar@integration-cumin:~$ sudo cumin --force 'name:docker-1040' 'hostname'
1 hosts will be targeted:
integration-agent-docker-1040.integration.eqiad1.wikimedia.cloud
FORCE mode enabled, continuing without confirmation
----- OUTPUT of 'hostname' -----                                                              
integration-agent-docker-1040                                                                 
================                                                                              
PASS |████████████████████████████████████████████████| 100% (1/1) [00:00<00:00,  2.01hosts/s]
FAIL |                                                        |   0% (0/1) [00:00<?, ?hosts/s]
100.0% (1/1) success ratio (>= 100.0% threshold) for command: 'hostname'.
100.0% (1/1) success ratio (>= 100.0% threshold) of nodes successfully executed all commands.