Page MenuHomePhabricator

Downgrade intergration-puppetmaster back to Ubuntu Precise (re-create instance)
Closed, ResolvedPublic

Description

Most puppet failures blocking T94916 may be caused by the fact that intergration-puppetmaster was inadvertently changed to Trusty; puppetmaster version of Trusty is not yet supported by ops.

From @BBlack

If we patched over these couple of Package issues but left that master on trusty, you're just gonna keep finding endless new issues to solve, and solving some of them will turn out to be risky for the prod environment, too. it's best to deal with a big leap forward on the master in sync, because puppet on trusty is a different version that precise, and that matters a lot

puppet tends to not be very compatible across versions :/ there's issues with client compatibility levels too, but we've solved a lot of that for prod already because we have trusty/jessie clients in prod already but we haven't moved the master forward because that's a whole other ball of problems to deal with

See also:
T87484: Recreate integration-puppetmaster with new image (/var/ is too small)

Event Timeline

Krinkle claimed this task.
Krinkle removed Krinkle as the assignee of this task.
Krinkle raised the priority of this task from to Medium.
Krinkle updated the task description. (Show Details)
Krinkle set Security to None.
Krinkle added subscribers: Krinkle, Aklapper.
Krinkle added a subscriber: hashar.
hashar added a subscriber: BBlack.

I have recreated the instance to benefit from a new partitioning scheme which has an extended /var T87484: Recreate integration-puppetmaster with new image (/var/ is too small).

We indeed want to downgraded per @BBlack comment.

In short:

  • delete integration-puppetmaster
  • recreate it with the same name
  • once booted run puppet a bunch of time to complete the setup
  • on wikitech apply:
    • Class: role::puppet::self
    • Variables:
      • puppetmaster: integration-puppetmaster.eqiad.wmflabs
      • puppetmaster_autoupdate: true

Run puppet a few more times.

Then delete client certs on all instances and resign all certs as described on T87484#1081655

Might be smarter to save a copy of the certs on the current integration-puppetmaster and restore that on the new instance, but I have no idea which files to save nor whether it is going to work.

The integration project has some hiera data to have the instance default to point to a puppet master named integration-puppetmaster. So there is an egg and chicken problem there when recreating the puppetmaster.

I have:

  • commented out the hiera data on wikitech
  • created a new instance, ran puppet a few times and rebooted the instance
  • set the instance to be a puppet master to integration-puppetmaster.eqiad.wmflabs
  • ran puppet a few more times

On all clients ran:

sudo -s
rm -fR /var/lib/puppet/ssl /var/lib/puppet/client/ssl
puppet agent -tv

Then on the master puppet ca list to get a list of client certs, then manually signed each of them with puppet ca sign i-000XXXX.eqiad.wmflabs.

Then ran again puppet agent -tv on all clients.

The whole process is tedious and can surely be simplified by someone smarter. But it did the job :)

Puppet on integration-puppetmaster has been failing for the past 2 days since April 3:

Failed when searching for node i-0000063a.eqiad.wmflabs: You must set the 'external_nodes' parameter to use the external node terminus

i-0000063a.eqiad.wmflabs is integraton-dev.eqiad.wmflabs. Other instances may be affected as well.

It was surely passing on all available instances when I closed the bug. puppet is passing on integration-dev.eqiad.wmflabs right now so it must have been some unrelated / transient issue.