puppet dependency loop on deployment-sca hosts
Open, HighPublic

Description

root@deployment-salt02:~# salt -v 'deployment-sca01*' cmd.run 'tail -n5 /var/log/puppet.log'
deployment-sca01.deployment-prep.eqiad.wmflabs:

?[0;32mInfo: Caching catalog for deployment-sca01.deployment-prep.eqiad.wmflabs?[0m
?[0;32mInfo: Applying configuration version '1500561441'?[0m
?[1;31mError: Failed to apply catalog: Found 1 dependency cycle:
(Exec[recommendation_api config deploy] => Service::Node::Config::Scap3[recommendation_api] => Scap::Target[recommendation-api/deploy] => User[deploy-service] => Exec[recommendation_api config deploy])
Try the '--graph' option and opening the resulting '.dot' file in OmniGraffle or GraphViz?[0m
hashar created this task.Jul 20 2017, 2:42 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 20 2017, 2:42 PM

deployment-trending01.deployment-prep.eqiad.wmflabs has a similar issue:

(Exec[trendingedits config deploy] => Service::Node::Config::Scap3[trendingedits] => Scap::Target[trending-edits/deploy] => User[deploy-service] => Exec[trendingedits config deploy])

Yeah, there seems to be something weird in the Scap3 config deploy part of service::node. The difference between Beta and production is that Beta doesn't have discovery, so a different path in the manifest is taken, but it is not clear to me at all how this problem materialised out of thin air (discovery was added quite some time ago).

Mentioned in SAL (#wikimedia-releng) [2017-07-20T15:08:10Z] <hashar> removed profile::recommendation_api from deployment-sca01 to try to fix the ssh access for mobrovac T171173 T171174

On deployment-sca01 I have removed profile::recommendation_api puppet then fails with:

Error: Failed to apply catalog:
  Could not find dependent Service[eventlogging/init]
  for File[/usr/local/lib/eventlogging/filters.py]
  at /etc/puppet/modules/eventlogging/manifests/plugin.pp:49?[0m

I have added profile::recommendation_api back on deployment-sca01.

I can't make sense of this part:

User[deploy-service] => Exec[recommendation_api config deploy]

I just don't see that dependency in the code. How can a User resource depend on an Exec ? Wth?

mobrovac triaged this task as High priority.Jul 20 2017, 6:37 PM

Setting to high prio, as this is now precluding us from logging into the boxes and test T170548: nodejs 6.11.

@mobrovac you could remove the puppet class from the instance. Restart the instance after that wait 5-10 mins and restart again. Then you should be able to ssh in.

I did remove profile::recommendation_api on deployment-sca01 earlier but was hitting another puppet issue. That got fixed meanwhile.

I tried again right now after having removed:

profile::recommendation_api
role::citoid
role::cxserver
role::graphoid
role::apertium port: "2737"
role::eventstreams

Puppet thus managed to run. I then ssh directly as root, restarted nscd and nslcd and I managed to ssh to the instance with my regular user just fine.

I have reapplied the above class, hence there is a dependency loop again.

bearND added a subscriber: bearND.Jul 24 2017, 9:02 PM
thcipriani added a subscriber: thcipriani.EditedJul 24 2017, 9:52 PM

So what's happening is exec is subscribed to /etc/[service]/config-vars.yaml.

/etc/[service]/config-vars.yaml is a file resource, which "autorequires" /etc/[service] (in service::node) and requires the deploy-service user and group (see "autorequires" https://docs.puppet.com/puppet/3.8/types/file.html).

These are defined in scap::target; however service::node::config::scap3 (where /etc/[service]/config-vars.yaml lives) requires that service::node::config::scap3 run BEFORE scap::target: https://github.com/wikimedia/puppet/blob/production/modules/service/manifests/node.pp#L283 but it can't run BEFORE scap::target because it depends on the user and group inside scap::target.

Hence the dependency cycle.

I've created an ugly patch that lets folks deploy services on beta for the time being on deployment-tin, but needs additional thought for an actual fix.

awful.patch
From e473aa04d3eb92e7579bba574d93cc567aa2e6c5 Mon Sep 17 00:00:00 2001                                                                                                    
From: Root <root@deployment-puppetmaster02.deployment-prep.eqiad.wmflabs>                                                                                                 
Date: Mon, 24 Jul 2017 21:46:42 +0000                                                                                                                                     
Subject: [PATCH] thcipriani: Fix for service dependency loops                        

This creates the user and the deployer as part of Service::Node                      
so that we no longer have a dependency cycle.                                        

Bug: T171173                              
---                                       
 modules/service/manifests/node.pp | 19 +++++++++++++++++++                          
 1 file changed, 19 insertions(+)         

diff --git a/modules/service/manifests/node.pp b/modules/service/manifests/node.pp   
index c7341723a3..7a65eb506c 100644       
--- a/modules/service/manifests/node.pp   
+++ b/modules/service/manifests/node.pp   
@@ -186,6 +186,25 @@ define service::node(
                     before       => Base::Service_unit[$title],                     
                     manage_user  => true,
                 }                        
+                                         
+                if !defined(Group[$deployment_user]) {                              
+                    group { $deployment_user:                                       
+                        ensure => present,                                          
+                        system => true,  
+                        before => Scap::Target[$repo],                              
+                    }                    
+                }                        
+                                         
+                if !defined(User[$deployment_user]) {                               
+                    user { $deployment_user:                                        
+                        ensure     => present,                                      
+                        shell      => '/bin/bash',                                  
+                        home       => "/var/lib/${deployment_user}",                
+                        system     => true,                                         
+                        managehome => true,                                         
+                        before => Scap::Target[$repo],                              
+                    }                    
+                }                        
             }                            
             include ::scap::conftool                                                                                                         
         }                                                                                                                                                               
--                                                                                                                                                                       
2.11.0
Joe added a comment.Jul 25 2017, 6:35 AM

ok, what I don't understand atm is why this works in production and not in deployment-prep.

more in general, that whole class hierarchy is an intertwined mess and I'm gonna work on making it better.

But since @thcipriani 's patch works as an hotfix and allows testing things on deployment-prep, I don't think it is immediately urgent.

Thank you for solving this!

ok, what I don't understand atm is why this works in production and not in deployment-prep.

My understanding is that in production discovery is used while in Beta it isn't, the latter creating the circular dependency.

Thank you @thcipriani for the analysis and the patch!

Joe moved this task from Backlog to Blocking others on the User-Joe board.Jul 27 2017, 2:53 PM