Page MenuHomePhabricator

Scap not restarting Proton
Closed, InvalidPublic

Description

During the last several deploys and restarts of Proton we noticed that Scap does not restart the service. Here's the log of scap deploy --service-restart:

08:43:30 [deploy1001] <Command u'/usr/bin/git show -s --format=%ct a65705909da0a828d7554b86357ff7a9361cc187'>: starting process
08:43:30 [deploy1001] <Command u'/usr/bin/git show -s --format=%ct a65705909da0a828d7554b86357ff7a9361cc187', pid 24784>: process started
08:43:30 [deploy1001] <Command u'/usr/bin/git show -s --format=%ct a65705909da0a828d7554b86357ff7a9361cc187', pid 24784>: process completed
08:43:30 [deploy1001] <Command u'/usr/bin/git show -s --format=%ct a65705909da0a828d7554b86357ff7a9361cc187', pid 24784>: process completed
08:43:30 [deploy1001] <Command u'/usr/bin/git show -s --format=%ct a65705909da0a828d7554b86357ff7a9361cc187', pid 24784>: process completed
08:43:30 [deploy1001] <Command u'/usr/bin/git ls-remote --get-url'>: starting process
08:43:30 [deploy1001] <Command u'/usr/bin/git ls-remote --get-url', pid 24788>: process started
08:43:30 [deploy1001] <Command u'/usr/bin/git ls-remote --get-url', pid 24788>: process completed
08:43:30 [deploy1001] <Command u'/usr/bin/git ls-remote --get-url', pid 24788>: process completed
08:43:30 [deploy1001] <Command u'/usr/bin/git ls-remote --get-url', pid 24788>: process completed
08:43:30 [deploy1001] <Command u'/usr/bin/git tag --list scap/sync/*'>: starting process
08:43:30 [deploy1001] <Command u'/usr/bin/git tag --list scap/sync/*', pid 24792>: process started
08:43:30 [deploy1001] <Command u'/usr/bin/git tag --list scap/sync/*', pid 24792>: process completed
08:43:30 [deploy1001] <Command u'/usr/bin/git tag --list scap/sync/*', pid 24792>: process completed
08:43:30 [deploy1001] <Command u'/usr/bin/git tag --list scap/sync/*', pid 24792>: process completed
08:43:30 [deploy1001] Started restart [proton/deploy@a657059]
08:43:30 [deploy1001] <Command u'/usr/bin/git tag --list scap/sync/2018-10-17/*'>: starting process
08:43:30 [deploy1001] <Command u'/usr/bin/git tag --list scap/sync/2018-10-17/*', pid 24796>: process started
08:43:30 [deploy1001] <Command u'/usr/bin/git tag --list scap/sync/2018-10-17/*', pid 24796>: process completed
08:43:30 [deploy1001] <Command u'/usr/bin/git tag --list scap/sync/2018-10-17/*', pid 24796>: process completed
08:43:30 [deploy1001] <Command u'/usr/bin/git rev-parse --verify scap/sync/2018-10-16/0004^{}'>: starting process
08:43:30 [deploy1001] <Command u'/usr/bin/git rev-parse --verify scap/sync/2018-10-16/0004^{}', pid 24800>: process started
08:43:30 [deploy1001] <Command u'/usr/bin/git rev-parse --verify scap/sync/2018-10-16/0004^{}', pid 24800>: process completed
08:43:30 [deploy1001] <Command u'/usr/bin/git rev-parse --verify scap/sync/2018-10-16/0004^{}', pid 24800>: process completed
08:43:30 [deploy1001] <Command u'/usr/bin/git rev-parse --verify scap/sync/2018-10-16/0004^{}', pid 24800>: process completed
08:43:30 [deploy1001] Deploying Rev: scap/sync/2018-10-16/0004^{} = d63ef16865995b42b6e13e4e175b09772adf25fc
08:43:30 [deploy1001] <Command u'/usr/bin/git for-each-ref --sort=taggerdate --format=%(refname) refs/tags'>: starting process
08:43:30 [deploy1001] <Command u'/usr/bin/git for-each-ref --sort=taggerdate --format=%(refname) refs/tags', pid 24806>: process started
08:43:30 [deploy1001] <Command u'/usr/bin/git for-each-ref --sort=taggerdate --format=%(refname) refs/tags', pid 24806>: process completed
08:43:30 [deploy1001] <Command u'/usr/bin/git for-each-ref --sort=taggerdate --format=%(refname) refs/tags', pid 24806>: process completed
08:43:30 [deploy1001] <Command u'/usr/bin/git for-each-ref --sort=taggerdate --format=%(refname) refs/tags', pid 24806>: process completed
08:43:30 [deploy1001] <Command u'/usr/bin/git tag -d scap/sync/2018-05-16/0001'>: starting process
08:43:30 [deploy1001] <Command u'/usr/bin/git tag -d scap/sync/2018-05-16/0001', pid 24810>: process started
08:43:30 [deploy1001] <Command u'/usr/bin/git tag -d scap/sync/2018-05-16/0001', pid 24810>: process completed
08:43:30 [deploy1001] <Command u'/usr/bin/git update-server-info'>: starting process
08:43:30 [deploy1001] <Command u'/usr/bin/git update-server-info', pid 24814>: process started
08:43:30 [deploy1001] <Command u'/usr/bin/git update-server-info', pid 24814>: process completed
08:43:30 [deploy1001] <Command u'/usr/bin/git submodule foreach --recursive git update-server-info'>: starting process
08:43:30 [deploy1001] <Command u'/usr/bin/git submodule foreach --recursive git update-server-info', pid 24818>: process started
08:43:30 [deploy1001] <Command u'/usr/bin/git submodule foreach --recursive git update-server-info', pid 24818>: process completed
08:43:30 [deploy1001] Started restart [proton/deploy@a657059]: (no justification provided)
08:43:30 [deploy1001] 
== DEFAULT1 ==
:* proton2002.codfw.wmnet
08:43:41 [deploy1001] 
== DEFAULT2 ==
:* proton1001.eqiad.wmnet
08:43:41 [deploy1001] 
== DEFAULT3 ==
:* proton1002.eqiad.wmnet
08:43:42 [deploy1001] 
== DEFAULT4 ==
:* proton2001.codfw.wmnet
08:43:43 [deploy1001] 
== DEFAULT1 ==
:* proton2002.codfw.wmnet
08:43:44 [deploy1001] 
== DEFAULT2 ==
:* proton1001.eqiad.wmnet
08:43:45 [deploy1001] 
== DEFAULT3 ==
:* proton1002.eqiad.wmnet
08:43:46 [deploy1001] 
== DEFAULT4 ==
:* proton2001.codfw.wmnet
08:43:47 [deploy1001] Finished restart [proton/deploy@a657059] (duration: 00m 16s)

Event Timeline

mobrovac triaged this task as High priority.Oct 17 2018, 8:49 AM
mobrovac created this task.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 17 2018, 8:49 AM

I don't see a service_name in /srv/deployment/proton/deploy/.git/DEPLOY_HEAD. Currently service_name is commented out in /srv/deployment/proton/deploy/scap/scap.cfg.

It looks like the repo at /srv/deployment/proton/deploy is 8 commits ahead of https://gerrit.wikimedia.org/r/p/mediawiki/services/chromium-render/deploy.git. Most of the commits are merge commits. The one causing this issue is:

commit 0f9e3577ccd118faa1391dc21457096d1a28f249                                                                                                       
Author: Alexandros Kosiaris <akosiaris@wikimedia.org>                                                                                                 
Date:   Mon Jun 11 14:24:52 2018 +0000                                                                                                                
                                                                                                                                                      
    Check something                                                                                                                                   
                                                                                                                                                      
diff --git a/scap/checks.yaml b/scap/checks.yaml                                                                                                      
deleted file mode 100644                                                                                                                              
index d545e5d..0000000                                                                                                                                
--- a/scap/checks.yaml                                                                                                                                
+++ /dev/null                                                                                                                                         
@@ -1,13 +0,0 @@                                                                                                                                      
-checks:                                                                                                                                              
-  depool:                                                                                                                                            
-    type: command                                                                                                                                    
-    stage: promote                                                                                                                                   
-    command: depool-proton                                                                                                                           
-  endpoints:                                                                                                                                         
-    type: nrpe                                                                                                                                       
-    stage: restart_service                                                                                                                           
-    command: check_endpoints_proton                                                                                                                  
-  repool:                                                                                                                                            
-    type: command                                                                                                                                    
-    stage: restart_service                                                                                                                           
-    command: pool-proton                                                                                                                             
diff --git a/scap/scap.cfg b/scap/scap.cfg                                                                                                            
index 78f4b6f..5169954 100644                                                                                                                         
--- a/scap/scap.cfg                                                                                                                                   
+++ b/scap/scap.cfg                                                                                                                                   
@@ -6,7 +6,7 @@ ssh_user: deploy-service                                                                                                              
 dsh_targets: targets                                                                                                                                 
 group_size: 1                                                                                                                                        
 git_submodules: True                                                                                                                                 
-service_name: proton                                                                                                                                 
+#service_name: proton                                                                                                                                
 service_port: 24766                                                                                                                                  
 lock_file: /tmp/scap.proton.lock                                                                                                                     
 config_deploy: True
mobrovac closed this task as Invalid.Oct 17 2018, 8:20 PM

Ah, how did I miss this? Oh, because I was looking at my local copy of the deploy repo. Thank you @thcipriani for pointing out the obvious and sorry for wasting your time.

oh dammit, I 've never killed that commit. Sorry about that.