Page MenuHomePhabricator

Deployment-prep hosts with puppet errors (tracking)
Closed, ResolvedPublic

Description

This is a Tracking-Neverending task for any puppet errors we might encounter on the beta cluster infrastructure. Puppet errors typically occurs when Puppet changes are made that are not necessarily taking in account beta (typically, class being renamed but not updated in wikitech, hiera parameters missing ...).

Related Objects

StatusSubtypeAssignedTask
ResolvedEddieGP
ResolvedArielGlenn
ResolvedKrenair
ResolvedNone
ResolvedNone
Resolved mobrovac
Resolved mmodell
Resolved mmodell
Resolvedhashar
DeclinedNone
Invalid mmodell
ResolvedEevans
DuplicateNone
Resolvedhashar
ResolvedKrenair
DuplicateNone
DeclinedNone
Resolvedbd808
Resolvedthcipriani
ResolvedNone
ResolvedNone
ResolvedNone
ResolvedNone
ResolvedOttomata
ResolvedLadsgroup
ResolvedEevans
Resolved AlexMonk-WMF
Resolvedakosiaris
Resolvedhashar
Resolvedhashar
ResolvedOttomata
ResolvedKrenair
ResolvedKrenair
Declinedhashar
Resolved mmodell
ResolvedNone
ResolvedOttomata
ResolvedKrenair
ResolvedOttomata
Resolvedfgiunchedi
ResolvedKrenair
Resolved mobrovac
Resolvedayounsi
DuplicateNone
ResolvedKrenair
ResolvedMoritzMuehlenhoff
Resolvedfgiunchedi
Resolved mobrovac
ResolvedLadsgroup
ResolvedEddieGP
ResolvedMarcoAurelio
ResolvedKrenair
DuplicateNone
ResolvedNone
Resolvedelukey
ResolvedNone

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
hashar subscribed.

Puppet failures related to Ganglia would be due to T134808 which have fixed cherry picked on beta puppet master but do not cover every cases.

hashar triaged this task as Medium priority.Jun 30 2016, 10:33 AM

Is this really best as a tracking task or should we add it to the deployment-prep workboard column? The task by its nature is always gonna be open (or reopened).

It's fine with me if you want to move them all to a particular workboard column instead of a tracking task

-snapshot01 is T184270 (package it wants is missing from stretch, moritz to fix when higher priority things are done)

As of today there are 16 shinken alerts (most puppet but at least one disk warning) on this project, and three VMs that are shut down but not deleted. All of this is viewable here: http://shinken.wmflabs.org/problems?search=deployment

deployment-videoscaler01 seems no longer exist?

$ ssh -a deployment-videoscaler01
channel 0: open failed: connect failed: No route to host / stdio forwarding failed
ssh_exchange_identification: Connection closed by remote host

@Andrew et al. Some docs on Wikitech on usual puppet errors and how to fix them would IMHO help. I feel some of us who has access to deployment-prep could help if we had some guidance. Also, IRC assistance would be great, if at all possible. Thanks.

EddieGP claimed this task.
EddieGP subscribed.
In T132259#3879429, demon wrote:

Is this really best as a tracking task or should we add it to the deployment-prep workboard column? The task by its nature is always gonna be open (or reopened).

In T132259#3879432, Krenair wrote:

It's fine with me if you want to move them all to a particular workboard column instead of a tracking task

I've now created an "Puppet errors" column on the Beta-Cluster-Infrastructure workboard and moved all open subtasks to that column.