Page MenuHomePhabricator

cumin on labspuppemaster doesn't work anymore for projects migrated to eqiad1-r
Closed, ResolvedPublic

Description

Hi!

Cumin on labspuppetmaster does't work anymore for projects migrated to eqiad1-r since its config.yaml file is not correct (region_name needs to be changed). As interim solution I am using cumin -c myconfig.yaml and everything works fine (thanks to @Volans for the solution) but it would be great to fix it (eventually when all projects are migrated) and let people know about the issue :)

Event Timeline

Yeah ideally the openstack backend would be made multi-region, though I don't know if we'll be running multi-region long enough to make it worth it :/

BTW if you're running in a project with puppetdb set up you can avoid the problem altogether :)

I brought this up few weeks ago in the WMCS-admin IRC channel, explaining also that cumins puppetization uses the hiera variable profile::openstack::main::region in modules/profile/manifests/openstack/main/cumin/master.pp and to feel free to change/override it at will based on the migration.
The other short term option is to generate two different config files like config-eqiad.yaml and config-eqiad1-r.yaml and maybe adding a bash alias ease of use.

Let me know if you need anything from my side on this.

Yeah ideally the openstack backend would be made multi-region, though I don't know if we'll be running multi-region long enough to make it worth it :/

Well, we have 4 regions right now (only 2 customer facing, main and eqiad1) and we may have more in the [not near] future.

@Volans I would say adding region support for the openstack cumin backend would be a good idea if that can be done without investing a lot of effort.
If not, sure, we can flip the hiera key one day and point to eqiad1-r.

Yeah ideally the openstack backend would be made multi-region, though I don't know if we'll be running multi-region long enough to make it worth it :/

Well, we have 4 regions right now (only 2 customer facing, main and eqiad1) and we may have more in the [not near] future.

Right but those are 2 regions each per keystone setup right? One tenant in the main/eqiad1-r keystone is not the same tenant as in the labtest/labtestn regions, so shouldn't be able to interact in privileged ways like this? I wouldn't expect to go into deployment-prep in main/eqiad1-r and (assuming there was a deployment-prep tenant in labtest) start mucking with labtest VMs. I would expect to be able to interact with deployment-prep's main/eqiad1-r VMs.

If not, sure, we can flip the hiera key one day and point to eqiad1-r.

Might as well do that now regardless seeing as we appear to have passed the 50% mark in terms of instance count?

krenair@shinken-02:~$ python countMigratedInstances.py 
('Migrated', 393)
('Non-migrated', 338)

Which means by default cumin on that host will be targeting 46% of instances right now.

Change 480769 had a related patch set uploaded (by Hashar; owner: Hashar):
[operations/puppet@production] contint: instances are fully on eqiad1-r

https://gerrit.wikimedia.org/r/480769

I have tried setting up profile::openstack::main::region to eqiad1-r via hieradata/labs/integration/common.yaml. But it seems hiera prefers querying labs-puppetmaster.wikimedia.org first.

The patch is applied on integration-puppetmaster01. Doing a hiera call:

integration-puppetmaster01:~$ RUBYLIB=/var/lib/git/operations/puppet/modules/wmflib/lib hiera --debug --config /etc/puppet/hiera.yaml profile::openstack::main::region
DEBUG: 2018-12-19 15:58:38 +0000: Looking up profile::openstack::main::region
DEBUG: 2018-12-19 15:58:38 +0000: Looking up profile::openstack::main::region
DEBUG: 2018-12-19 15:58:38 +0000: Looking up profile::openstack::main::region
DEBUG: 2018-12-19 15:58:38 +0000: Loading info from labs for profile::openstack::main::region
DEBUG: 2018-12-19 15:58:38 +0000: The source is: labs
DEBUG: 2018-12-19 15:58:38 +0000: Searching for profile::openstack::main::region in /etc/puppet/hieradata/labs.yaml
DEBUG: 2018-12-19 15:58:38 +0000: Loading file /etc/puppet/hieradata/labs.yaml
DEBUG: 2018-12-19 15:58:38 +0000: Loading info from private/labs for profile::openstack::main::region
DEBUG: 2018-12-19 15:58:38 +0000: The source is: labs
DEBUG: 2018-12-19 15:58:38 +0000: Searching for profile::openstack::main::region in /etc/puppet/private/hieradata/labs.yaml
DEBUG: 2018-12-19 15:58:38 +0000: Loading file /etc/puppet/private/hieradata/labs.yaml
DEBUG: 2018-12-19 15:58:38 +0000: Loading info from common for profile::openstack::main::region
DEBUG: 2018-12-19 15:58:38 +0000: The source is: common
DEBUG: 2018-12-19 15:58:38 +0000: Searching for profile::openstack::main::region in /etc/puppet/hieradata/common.yaml
DEBUG: 2018-12-19 15:58:38 +0000: Loading file /etc/puppet/hieradata/common.yaml
DEBUG: 2018-12-19 15:58:38 +0000: Loading info from secret/common for profile::openstack::main::region
DEBUG: 2018-12-19 15:58:38 +0000: The source is: common
DEBUG: 2018-12-19 15:58:38 +0000: Cannot find datafile /etc/puppet/secret/hieradata/common.yaml, skipping
DEBUG: 2018-12-19 15:58:38 +0000: Searching for profile::openstack::main::region in 
DEBUG: 2018-12-19 15:58:38 +0000: Loading info from private/common for profile::openstack::main::region
DEBUG: 2018-12-19 15:58:38 +0000: The source is: common
DEBUG: 2018-12-19 15:58:38 +0000: Searching for profile::openstack::main::region in /etc/puppet/private/hieradata/common.yaml
DEBUG: 2018-12-19 15:58:38 +0000: Loading file /etc/puppet/private/hieradata/common.yaml

Change 480769 abandoned by Hashar:
contint: instances are fully on eqiad1-r

Reason:
I have no idea why hiera on integration WMCS project does not honor the setting. There is a fault somewhere in the lookup hierarchy.

Eventually it will get fixed when the region is switched for all of WMCS.

https://gerrit.wikimedia.org/r/480769

On integration-cumin.integration.eqiad.wmflabs I have manually edited /etc/cumin/config.yaml to update the region and it worked just fine. Then I ran puppet to restore the state:

Debug: Using cached connection for https://integration-puppetmaster01.integration.eqiad.wmflabs:8140
Debug: Caching connection for https://integration-puppetmaster01.integration.eqiad.wmflabs:8140
Debug: Executing: 'diff -u /etc/cumin/config.yaml /tmp/puppet-file20190205-26295-25qq4z'
Notice: /Stage[main]/Profile::Openstack::Main::Cumin::Master/File[/etc/cumin/config.yaml]/content: 
--- /etc/cumin/config.yaml	2019-02-05 09:30:12.085892760 +0000
+++ /tmp/puppet-file20190205-26295-25qq4z	2019-02-05 09:30:46.162233208 +0000
@@ -17,7 +17,7 @@
     nova_api_version: 2.12
     timeout: 2
     client_params:
-        region_name: eqiad1-r
+        region_name: eqiad
     query_params:
         project: integration
Debug: /Stage[main]/Profile::Openstack::Main::Cumin::Master/File[/etc/cumin/config.yaml]: The container Class[Profile::Openstack::Main::Cumin::Master] will propagate my refresh event

Eventually the instance in Horizon has some custom puppet config ( at https://horizon.wikimedia.org/project/instances/d1f017d7-e14b-4277-aa41-a950ab69a47f/ ):

profile::openstack::main::cumin::aliases: {}
profile::openstack::main::cumin::project_ssh_priv_key_path: /root/.ssh/cumin
profile::openstack::main::region: eqiad

I have dropped the profile::openstack::main::region: eqiad line, ran puppet again and the cumin configuration is now correct:

$ sudo cumin '*' 'hostname' | cat
26 hosts will be targeted:
integration-castor03.integration.eqiad.wmflabs,integration-cumin.integration.eqiad.wmflabs,integration-puppetmaster01.integration.eqiad.wmflabs,integration-r-lang-01.integration.eqiad.wmflabs,integration-slave-docker-[1017,1021,1033-1034,1037-1038,1040-1041,1043-1047].integration.eqiad.wmflabs,integration-slave-jessie-[1001-1004].integration.eqiad.wmflabs,integration-slave-jessie-android.integration.eqiad.wmflabs,saucelabs-[01-03].integration.eqiad.wmflabs,webperformance.integration.eqiad.wmflabs
hashar claimed this task.