Page MenuHomePhabricator

CloudVPS: hiera refactor
Closed, ResolvedPublic

Description

For improved support of our multi openstack deployment situation, some hiera refactors in operationes/puppet.git are required/wanted.

  • per-deployment hiera trees
  • more robustness/consistency on the hierarchy that we use for hiera inside different deployments
  • eventually, rename the labs realm to something more meaningful, like cloud or wmcs or cloudvps.

The first patch that tried to introduce some of this was: https://gerrit.wikimedia.org/r/c/operations/puppet/+/569230
which was reverted because I discovered a chicken-egg problem in how that configures hiera.

Event Timeline

Trying to merge https://gerrit.wikimedia.org/r/c/operations/puppet/+/569230 I discovered the following issue:

  • we set $::wmcs_deployment and $::labsproject after we query $::realm (to be == labs., see realm.pp)
  • in the search tree introduced in the patch, the $::realm key is defined in a file that includes the deployment name on its path.
  • therefore, we can't set the search hierarchy based on these variables

Perhaps having $::wmcs_deployment and $::labsproject be provided by facter rather than computed at realm.pp would be a good way to break the chicken-egg problem.

Change 570079 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] wmflablibs: add new facts for wmflabs

https://gerrit.wikimedia.org/r/570079

As per https://phabricator.wikimedia.org/T171289, getting $project (and, possibly $deployment) from a fact may be problematic.

Proposal:

  • drop project/host hieradata from operations/puppet.git. Declare we have horizon for that. I doubt it is properly working anyway (for the same problem we can't use $::wmcs_deployment)
  • introduce per-deployment hierakeys in operations/puppet.git. But have the search hierarchy non-dependant on hiera lookups, but have it harcoded (we have different files for different deployments anyway)

This way we:

  • eliminate the potentially dangerous facts gathering
  • eliminate a duplicity in how hiera is stablished (horizon vs ops/puppet.git)
  • effectively support per-deployment hiera data, which is the ultimate goal we are after.

Will try to create a patch for this.

Change 570294 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] hiera: cloud: drop toollabs::external_hostname and toollabs::is_mail_relay

https://gerrit.wikimedia.org/r/570294

Change 570294 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] hiera: cloud: drop toollabs::external_hostname and toollabs::is_mail_relay

https://gerrit.wikimedia.org/r/570294

Change 570313 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] hiera: cloud: tools: drop hiera keys migrated to horizon

https://gerrit.wikimedia.org/r/570313

Change 570313 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] hiera: cloud: tools: drop hiera keys migrated to horizon

https://gerrit.wikimedia.org/r/570313

Change 570314 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] hiera: cloud: tools: drop data for tools-bastion-03

https://gerrit.wikimedia.org/r/570314

Change 570314 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] hiera: cloud: tools: drop data for tools-bastion-03

https://gerrit.wikimedia.org/r/570314

Proposal:

  • drop project/host hieradata from operations/puppet.git. Declare we have horizon for that. I doubt it is properly working anyway (for the same problem we can't use $::wmcs_deployment)
  • introduce per-deployment hierakeys in operations/puppet.git. But have the search hierarchy non-dependant on hiera lookups, but have it harcoded (we have different files for different deployments anyway)

This way we:

  • eliminate the potentially dangerous facts gathering
  • eliminate a duplicity in how hiera is stablished (horizon vs ops/puppet.git)
  • effectively support per-deployment hiera data, which is the ultimate goal we are after.

Will try to create a patch for this.

Reading this https://gerrit.wikimedia.org/r/c/operations/puppet/+/536663 T232509: cloud-puppetmasters: move some hiera settings from Horizon to git/gerrit, this seems to be a use case we still want to support.

My proposal then is to end with a search tree configuration like this:

:hierarchy:
  # data stored in ops/puppet.git, hardcoded deployment name, host overrides
  - "cloud/eqiad1/hosts/%{trusted.hostname}"
  # data stored in ops/puppet.git, hardcoded deployment name, deployment-global overrides
  - "cloud/eqiad1"
  # data stored in ops/puppet.git, cloud-global overrides
  - "cloud"

I declared that using hiera data to resolve the path of hiera backends was something to stop doing because the chicken-egg problem. But I really wonder why $labsproject worked and not $wmcs_deployment, given they are computed in exactly the same place (realm.pp)

Change 570330 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] wmflib::end_with: create String.end_with function

https://gerrit.wikimedia.org/r/570330

Change 570331 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] realm global: make the realm variable a global in labs

https://gerrit.wikimedia.org/r/570331

Change 570348 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] wmflib::require_domains: use require_domains instead of require_realm

https://gerrit.wikimedia.org/r/570348

Change 570350 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] realm: remove realm global variable

https://gerrit.wikimedia.org/r/570350

Change 570350 abandoned by Jbond:
realm: remove realm global variable

Reason:
git grep \$realm != git grep ::realm

https://gerrit.wikimedia.org/r/570350

Change 570369 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] realm: make the realm variable a global in labs

https://gerrit.wikimedia.org/r/570369

Change 570330 abandoned by Jbond:
wmflib::end_with: create String.end_with function

Reason:
as pointed our $foo =~ "${bar}$" is easy enough

https://gerrit.wikimedia.org/r/570330

Change 570331 merged by Jbond:
[operations/puppet@production] realm global: make the realm variable a global in labs

https://gerrit.wikimedia.org/r/570331

Change 570079 abandoned by Andrew Bogott:
wmflablibs: add new facts for wmflabs

Reason:
dropping in favor of checks on the certname

https://gerrit.wikimedia.org/r/570079

Change 570369 abandoned by Andrew Bogott:
realm: make the realm variable a global in labs

Reason:
this was done in a separate jbond patch

https://gerrit.wikimedia.org/r/570369

The patch to re-organize is:

https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/571562/

It looks right to me, but after merging I get failures like:

"Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Evaluation Error: Error while evaluating a Function Call, DNS lookup failed for Resolv::DNS::Resource::IN::A at /etc/puppet/manifests/realm.pp:64:9 on node util-abogott-buster.testlabs.eqiad.wmflabs"

That looks to be because this dnsconfig isn't getting set properly:

$dnsconfig = lookup('labsdnsconfig',Hash, 'hash', {})

Which in turn is probably because the hiera lookup isn't working properly. @jbond offered to help debug this but immediately ran into this upstream issue: https://tickets.puppetlabs.com/browse/PUP-9207

I'm assigning this to @jbond for now; John, if you get stumped or don't have time to work on this please just assign it back to me and I'll just stick debugging lines all over the place. Thanks!

Change 572230 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] puppetmaster: notify apache2 when the hiera.yaml file is updated

https://gerrit.wikimedia.org/r/572230

I had a look at this and applied the change manually and have been able to recreate. I get this error if i make the change but don't restart Apache on the puppet master forcing it to re-read the hierarchy but its fixed shortly after that.

My guess is the change got applied and synced to the puppet master which updated all the yaml directories structures. There is nothing to guarantee that puppet runs on the puppet master at this point so untill puppet runs on the puppet master you would get errors as the whole of the hiera tree will be, as far as the puppet master is concerned, missing as /etc/puppet/hiera.yaml would not have been updated.

In a perfect world once puppet ran on the puppet master everything would be fixed again. however there is no subscribe/notify on /etc/puppet/hiera.yaml to tell apache2 to reload (thus forcing passenger to reload) so currently you would also need to manually restart apache (i have a cr to fix this)

I would recomend the following approach for a re-try

  1. disable puppet on all wmcs hosts
  2. merge change
  3. run puppet on puppet master(s)
  4. restart apache2 on puppet masters(s)
  5. test
  6. enable puppet

Change 572230 merged by Andrew Bogott:
[operations/puppet@production] puppetmaster: notify apache2 when the hiera.yaml file is updated

https://gerrit.wikimedia.org/r/572230

Change 572626 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] realm: rename labs realm to cloud

https://gerrit.wikimedia.org/r/572626

Shall we move on and refactor the labs/private.git hiera tree too?

jbond triaged this task as Medium priority.

@aborrero Have assigned this back to you (not sure when i got it assigned). I have a feeling this may have been completed already but as always happy to help :)

Yeah, my last question still remains. There is some work to do on labs/private.git, but we can work on that later.

Thanks!

Change 572626 abandoned by Arturo Borrero Gonzalez:

[operations/puppet@production] realm: rename labs realm to cloud

Reason:

not working on this at the moment

https://gerrit.wikimedia.org/r/572626

Change 570348 abandoned by Jbond:

[operations/puppet@production] wmflib::require_domains: use require_domains instead of require_realm

Reason:

underlining function was abandoned

https://gerrit.wikimedia.org/r/570348