Page MenuHomePhabricator

puppet dependency loop on deployment-sca hosts
Closed, ResolvedPublic

Description

root@deployment-salt02:~# salt -v 'deployment-sca01*' cmd.run 'tail -n5 /var/log/puppet.log'
deployment-sca01.deployment-prep.eqiad.wmflabs:

?[0;32mInfo: Caching catalog for deployment-sca01.deployment-prep.eqiad.wmflabs?[0m
?[0;32mInfo: Applying configuration version '1500561441'?[0m
?[1;31mError: Failed to apply catalog: Found 1 dependency cycle:
(Exec[recommendation_api config deploy] => Service::Node::Config::Scap3[recommendation_api] => Scap::Target[recommendation-api/deploy] => User[deploy-service] => Exec[recommendation_api config deploy])
Try the '--graph' option and opening the resulting '.dot' file in OmniGraffle or GraphViz?[0m

Event Timeline

deployment-trending01.deployment-prep.eqiad.wmflabs has a similar issue:

(Exec[trendingedits config deploy] => Service::Node::Config::Scap3[trendingedits] => Scap::Target[trending-edits/deploy] => User[deploy-service] => Exec[trendingedits config deploy])

Yeah, there seems to be something weird in the Scap3 config deploy part of service::node. The difference between Beta and production is that Beta doesn't have discovery, so a different path in the manifest is taken, but it is not clear to me at all how this problem materialised out of thin air (discovery was added quite some time ago).

Mentioned in SAL (#wikimedia-releng) [2017-07-20T15:08:10Z] <hashar> removed profile::recommendation_api from deployment-sca01 to try to fix the ssh access for mobrovac T171173 T171174

On deployment-sca01 I have removed profile::recommendation_api puppet then fails with:

Error: Failed to apply catalog:
  Could not find dependent Service[eventlogging/init]
  for File[/usr/local/lib/eventlogging/filters.py]
  at /etc/puppet/modules/eventlogging/manifests/plugin.pp:49?[0m

I have added profile::recommendation_api back on deployment-sca01.

I can't make sense of this part:

User[deploy-service] => Exec[recommendation_api config deploy]

I just don't see that dependency in the code. How can a User resource depend on an Exec ? Wth?

Setting to high prio, as this is now precluding us from logging into the boxes and test T170548: nodejs 6.11.

@mobrovac you could remove the puppet class from the instance. Restart the instance after that wait 5-10 mins and restart again. Then you should be able to ssh in.

I did remove profile::recommendation_api on deployment-sca01 earlier but was hitting another puppet issue. That got fixed meanwhile.

I tried again right now after having removed:

profile::recommendation_api
role::citoid
role::cxserver
role::graphoid
role::apertium port: "2737"
role::eventstreams

Puppet thus managed to run. I then ssh directly as root, restarted nscd and nslcd and I managed to ssh to the instance with my regular user just fine.

I have reapplied the above class, hence there is a dependency loop again.

changeprop-cycle.png (569×874 px, 122 KB)

So what's happening is exec is subscribed to /etc/[service]/config-vars.yaml.

/etc/[service]/config-vars.yaml is a file resource, which "autorequires" /etc/[service] (in service::node) and requires the deploy-service user and group (see "autorequires" https://docs.puppet.com/puppet/3.8/types/file.html).

These are defined in scap::target; however service::node::config::scap3 (where /etc/[service]/config-vars.yaml lives) requires that service::node::config::scap3 run BEFORE scap::target: https://github.com/wikimedia/puppet/blob/production/modules/service/manifests/node.pp#L283 but it can't run BEFORE scap::target because it depends on the user and group inside scap::target.

Hence the dependency cycle.

I've created an ugly patch that lets folks deploy services on beta for the time being on deployment-tin, but needs additional thought for an actual fix.

awful.patch
From e473aa04d3eb92e7579bba574d93cc567aa2e6c5 Mon Sep 17 00:00:00 2001                                                                                                    
From: Root <root@deployment-puppetmaster02.deployment-prep.eqiad.wmflabs>                                                                                                 
Date: Mon, 24 Jul 2017 21:46:42 +0000                                                                                                                                     
Subject: [PATCH] thcipriani: Fix for service dependency loops                        

This creates the user and the deployer as part of Service::Node                      
so that we no longer have a dependency cycle.                                        

Bug: T171173                              
---                                       
 modules/service/manifests/node.pp | 19 +++++++++++++++++++                          
 1 file changed, 19 insertions(+)         

diff --git a/modules/service/manifests/node.pp b/modules/service/manifests/node.pp   
index c7341723a3..7a65eb506c 100644       
--- a/modules/service/manifests/node.pp   
+++ b/modules/service/manifests/node.pp   
@@ -186,6 +186,25 @@ define service::node(
                     before       => Base::Service_unit[$title],                     
                     manage_user  => true,
                 }                        
+                                         
+                if !defined(Group[$deployment_user]) {                              
+                    group { $deployment_user:                                       
+                        ensure => present,                                          
+                        system => true,  
+                        before => Scap::Target[$repo],                              
+                    }                    
+                }                        
+                                         
+                if !defined(User[$deployment_user]) {                               
+                    user { $deployment_user:                                        
+                        ensure     => present,                                      
+                        shell      => '/bin/bash',                                  
+                        home       => "/var/lib/${deployment_user}",                
+                        system     => true,                                         
+                        managehome => true,                                         
+                        before => Scap::Target[$repo],                              
+                    }                    
+                }                        
             }                            
             include ::scap::conftool                                                                                                         
         }                                                                                                                                                               
--                                                                                                                                                                       
2.11.0

ok, what I don't understand atm is why this works in production and not in deployment-prep.

more in general, that whole class hierarchy is an intertwined mess and I'm gonna work on making it better.

But since @thcipriani 's patch works as an hotfix and allows testing things on deployment-prep, I don't think it is immediately urgent.

Thank you for solving this!

ok, what I don't understand atm is why this works in production and not in deployment-prep.

My understanding is that in production discovery is used while in Beta it isn't, the latter creating the circular dependency.

Thank you @thcipriani for the analysis and the patch!

How easy would it be to move beta to using discovery like prod does?

@thcipriani Could you submit the patch to Gerrit as well so that it has a Change-Id and can be tagged beta-picked for review/feedback from SRE?

Change 447487 had a related patch set uploaded (by Thcipriani; owner: Thcipriani):
[operations/puppet@production] Beta: Fix for service dependency loops

https://gerrit.wikimedia.org/r/447487

@thcipriani Could you submit the patch to Gerrit as well so that it has a Change-Id and can be tagged beta-picked for review/feedback from SRE?

Done!

is this still required ? The issue described in the task also presented itself in production and was kind of resolved with Ia9a9efb0031dfed42ac779dc3551071adf333cfa

Change 447487 abandoned by Thcipriani:
Beta: Fix for service dependency loops

Reason:
Fixed in Ia9a9efb0031dfed42ac779dc3551071adf333cfa

https://gerrit.wikimedia.org/r/447487

thcipriani assigned this task to akosiaris.

Removed the cherry-pick from beta, ran puppet on deployment-sca01 and there were no errors.