Page MenuHomePhabricator

Rebuild integration-cumin to get rid of Debian Buster
Closed, ResolvedPublic

Description

The instance runs Cumin which we use as an helper to maintain the fleet of Jenkins agents. As Buster will be phased out on June 30th, the instance should be rebuild and match whatever Debian version production is using.

The instance:

integration-cumin.integration.eqiad1.wikimedia.cloud d8d69db2-700c-432e-a796-e8684c36ac0c 172.16.1.230 g2.cores1.ram2.disk20
https://openstack-browser.toolforge.org/server/integration-cumin.integration.eqiad1.wikimedia.cloud

Puppet classes:

profile::openstack::eqiad1::cumin::target
profile::openstack::eqiad1::cumin::master

Hiera settings:

profile::ci::docker::docker_version: 5:20.10.18~3-0~debian
profile::java::java_packages: [{'variant': 'jre-headless', 'version': '11'}]
profile::openstack::eqiad1::cumin::aliases: {}
profile::openstack::eqiad1::cumin::project_masters: ['172.16.1.230']
profile::openstack::eqiad1::cumin::project_pub_key: ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIPPfdabE1Fej0X86QgjY72LXvA3Wawrg0ZcDL0PF56/A root@integration-cumin
profile::openstack::eqiad1::cumin::project_ssh_priv_key_path: /root/.ssh/cumin
profile::openstack::eqiad1::region: eqiad1-r
puppetmaster: integration-puppetmaster-02.integration.eqiad.wmflabs

Event Timeline

@joanna_borun I am mostly likely going to need assistance from people knowing Cumin. Notably I don't know which Debian distribution should be favored (Bullseye, Bookworm?) , I am merely a end user of it and maybe some amendment will be needed in Puppet.

@joanna_borun I am mostly likely going to need assistance from people knowing Cumin. Notably I don't know which Debian distribution should be favored (Bullseye, Bookworm?) , I am merely a end user of it and maybe some amendment will be needed in Puppet.

Our main Cumin hosts are on Bullseye, so it's best if this follows along. If you need anything else (merges/review etc), just ping me on IRC.

Hello, @hashar! The deadline for this rebuild is today :)

Likely you can just replace this VM with an identically-puppetized Bullseye host. I'm doing the same in deployment-prep, although in that case I'm blocked by not having the keyholder passphrase.

I will rebuild them tomorrow as well as integration-pkgbuilder instances ( T360786 ). That should be straightforward given production has already migrated and that the systems are fully puppetized.

Hello, @hashar! The deadline for this rebuild is today :)

Likely you can just replace this VM with an identically-puppetized Bullseye host. I'm doing the same in deployment-prep, although in that case I'm blocked by not having the keyholder passphrase.

<aside>
https://wikitech.wikimedia.org/wiki/Keyholder#WMCS_projects_passphrases
Should be in: deployment-puppetserver-1.deployment-prep.eqiad.wmflabs:/srv/git/labs/private/files/ssh/tin/cumin_rsa.passphrase
</aside>

Hello, @hashar! The deadline for this rebuild is today :)

Likely you can just replace this VM with an identically-puppetized Bullseye host. I'm doing the same in deployment-prep, although in that case I'm blocked by not having the keyholder passphrase.

<aside>
https://wikitech.wikimedia.org/wiki/Keyholder#WMCS_projects_passphrases
Should be in: deployment-puppetserver-1.deployment-prep.eqiad.wmflabs:/srv/git/labs/private/files/ssh/tin/cumin_rsa.passphrase
</aside>

Thanks! Bryan pointed me to that and I finished the migration this morning.

Mentioned in SAL (#wikimedia-releng) [2024-07-16T08:04:28Z] <hashar> integration: reimaging integration-cumin from Buster to Bullseye # T360784

I can't rebuild it cause of the Puppet configuration being broken :-/ I will file a task for it.

After fixing the Puppet self setup (T370130) the next breakage is the ssh key pair for cumin:

Error: /Stage[main]/Profile::Openstack::Eqiad1::Cumin::Master/Keyholder::Agent[cumin_openstack_integration_master]/File[/etc/keyholder.d/cumin_openstack_integration_master]: Could not evaluate: Could not retrieve information from environment production source(s) file:///root/.ssh/cumin
Error: /Stage[main]/Profile::Openstack::Eqiad1::Cumin::Master/Keyholder::Agent[cumin_openstack_integration_master]/File[/etc/keyholder.d/cumin_openstack_integration_master.pub]: Could not evaluate: Could not retrieve information from environment production source(s) file:///root/.ssh/cumin.pub

Those files were on disk and I manually verified the one from the Puppet server and present at:

/srv/git/labs/private/files/ssh/cumin.pub
/srv/git/labs/private/files/ssh/cumin

But they are somehow not shipped on the integration-cumin instance for some reason :/

I think the issue is profile::openstack::eqiad1::cumin::master does not have a secret() to ship the secrets files/ssh/cumin and files/ssh/cumin.pub under /root/.ssh.

If I do it manually, the key is rejected by the targets. I believe because Puppet does not update the authorized key:

/etc/ssh/userkeys/root.d
# Cumin Masters.
from="172.16.4.160,172.16.2.249,172.16.1.220",no-agent-forwarding,no-port-forwarding,no-x11-forwarding,no-user-rc ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAICcav+ECiF6hW2XRuP7R8nqDw4hPlD0OChsGvB6K27jK root@cloudinfra-internal-puppetmaster-02

from="172.16.1.230",no-agent-forwarding,no-port-forwarding,no-x11-forwarding,no-user-rc ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIPPfdabE1Fej0X86QgjY72LXvA3Wawrg0ZcDL0PF56/A root@integration-cumin

That 172.16.1.230 entry is the IP of the old instance. It now has 172.16.6.123.

I will file a task for SRE.

I had some trouble with the Puppet self T370130 and to provision Cumin since our Puppet manifest expects the ssh key to be generated on the instance rather than provided by labs/private (T370138). Eventually it is working now ;)