Downgrade intergration-puppetmaster back to Ubuntu Precise (re-create instance)
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Krinkle
	Apr 2 2015, 10:56 PM

Description

Most puppet failures blocking T94916 may be caused by the fact that intergration-puppetmaster was inadvertently changed to Trusty; puppetmaster version of Trusty is not yet supported by ops.

From @BBlack

If we patched over these couple of Package issues but left that master on trusty, you're just gonna keep finding endless new issues to solve, and solving some of them will turn out to be risky for the prod environment, too. it's best to deal with a big leap forward on the master in sync, because puppet on trusty is a different version that precise, and that matters a lot

puppet tends to not be very compatible across versions :/ there's issues with client compatibility levels too, but we've solved a lot of that for prod already because we have trusty/jessie clients in prod already but we haven't moved the master forward because that's a whole other ball of problems to deal with

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Resolved		Krinkle	T94916 Re-create ci slaves (April 2015)
		Resolved		hashar	T94927 Downgrade intergration-puppetmaster back to Ubuntu Precise (re-create instance)

Event Timeline

Krinkle created this task.Apr 2 2015, 10:56 PM

Krinkle claimed this task.

Krinkle removed Krinkle as the assignee of this task.

Krinkle raised the priority of this task from to Medium.

Krinkle updated the task description. (Show Details)

Krinkle added a project: Continuous-Integration-Infrastructure.

Krinkle set Security to None.

Krinkle added subscribers: Krinkle, Aklapper.

Krinkle added a subscriber: hashar.

Krinkle moved this task from Untriaged to Next on the Continuous-Integration-Infrastructure board.Apr 3 2015, 12:22 AM

Krinkle assigned this task to hashar.Apr 3 2015, 1:18 AM

hashar updated the task description. (Show Details)Apr 3 2015, 10:08 AM

hashar added a subscriber: BBlack.

I have recreated the instance to benefit from a new partitioning scheme which has an extended /var T87484: Recreate integration-puppetmaster with new image (/var/ is too small).

We indeed want to downgraded per @BBlack comment.

hashar mentioned this in T87484: Recreate integration-puppetmaster with new image (/var/ is too small).Apr 3 2015, 10:14 AM

In short:

delete integration-puppetmaster
recreate it with the same name
once booted run puppet a bunch of time to complete the setup
on wikitech apply:
- Class: role::puppet::self
- Variables:
  - puppetmaster: integration-puppetmaster.eqiad.wmflabs
  - puppetmaster_autoupdate: true

Run puppet a few more times.

Then delete client certs on all instances and resign all certs as described on T87484#1081655

Might be smarter to save a copy of the certs on the current integration-puppetmaster and restore that on the new instance, but I have no idea which files to save nor whether it is going to work.

The integration project has some hiera data to have the instance default to point to a puppet master named integration-puppetmaster. So there is an egg and chicken problem there when recreating the puppetmaster.

I have:

commented out the hiera data on wikitech
created a new instance, ran puppet a few times and rebooted the instance
set the instance to be a puppet master to integration-puppetmaster.eqiad.wmflabs
ran puppet a few more times

On all clients ran:

sudo -s
rm -fR /var/lib/puppet/ssl /var/lib/puppet/client/ssl
puppet agent -tv

Then on the master puppet ca list to get a list of client certs, then manually signed each of them with puppet ca sign i-000XXXX.eqiad.wmflabs.

Then ran again puppet agent -tv on all clients.

The whole process is tedious and can surely be simplified by someone smarter. But it did the job :)

hashar mentioned this in T94916: Re-create ci slaves (April 2015).Apr 3 2015, 12:07 PM

hashar moved this task from Next to Done on the Continuous-Integration-Infrastructure board.

Puppet on integration-puppetmaster has been failing for the past 2 days since April 3:

Failed when searching for node i-0000063a.eqiad.wmflabs: You must set the 'external_nodes' parameter to use the external node terminus

i-0000063a.eqiad.wmflabs is integraton-dev.eqiad.wmflabs. Other instances may be affected as well.

It was surely passing on all available instances when I closed the bug. puppet is passing on integration-dev.eqiad.wmflabs right now so it must have been some unrelated / transient issue.

scfc subscribed.Jun 17 2015, 11:02 PM

Downgrade intergration-puppetmaster back to Ubuntu Precise (re-create instance)Closed, ResolvedPublicActions

Description

Related ObjectsSearch...

Event Timeline

Downgrade intergration-puppetmaster back to Ubuntu Precise (re-create instance)
Closed, ResolvedPublic
Actions

Related Objects
Search...