Page MenuHomePhabricator

Graphoid is not syncing on beta cluster again
Closed, ResolvedPublic

Description

git deploy sync stays at 0/2.

Event Timeline

Yurik raised the priority of this task from to Needs Triage.
Yurik updated the task description. (Show Details)
Yurik added subscribers: Yurik, akosiaris, yuvipanda, mobrovac.

There were 2 salt-minion processes running on sca01; I killed both and then started the service back up. I also restarted the salt-minion process on sca02 for good measure.

Updating with direct salt calls on sca0[12] seems to work:

bd808@deployment-sca01:~$ sudo salt-call deploy.fetch 'graphoid/deploy' [68/239]
[INFO    ] Executing command '/usr/bin/git fetch' in directory '/srv/deployment/
graphoid/deploy'
[INFO    ] Executing command '/usr/bin/git fetch --tags' in directory '/srv/depl
oyment/graphoid/deploy'
[INFO    ] Executing command '/usr/bin/git submodule status --quiet' in director
y '/srv/deployment/graphoid/deploy'
[INFO    ] Executing command '/usr/bin/git checkout .gitmodules' in directory '/
srv/deployment/graphoid/deploy'
[INFO    ] Executing command '/usr/bin/git config remote.origin.url' in director
y '/srv/deployment/graphoid/deploy'
[INFO    ] Executing command '/usr/bin/git submodule sync' in directory '/srv/de
ployment/graphoid/deploy'
[INFO    ] Executing command '/usr/bin/git submodule foreach --recursive git fet
ch' in directory '/srv/deployment/graphoid/deploy'
[INFO    ] Executing command '/usr/bin/git submodule foreach --recursive git fet
ch --tags' in directory '/srv/deployment/graphoid/deploy'
[INFO    ] Executing command '/usr/bin/git show-ref refs/tags/graphoid/deploy-sy
nc-20151127-170244' in directory '/srv/deployment/graphoid/deploy'
local:
    ----------
    dependencies:
    repo:
        graphoid/deploy
    status:
        0
    tag:
        graphoid/deploy-sync-20151127-170244
bd808@deployment-sca01:~$ sudo salt-call deploy.checkout 'graphoid/deplo[40/239]
[INFO    ] Executing command '/usr/bin/git describe --always --tag' in directory
 '/srv/deployment/graphoid/deploy'
[INFO    ] Executing command u'/usr/bin/git checkout --force --quiet tags/grapho
id/deploy-sync-20151127-170244' in directory '/srv/deployment/graphoid/deploy'
[INFO    ] Executing command '/usr/bin/git submodule status --quiet' in director
y '/srv/deployment/graphoid/deploy'
[INFO    ] Executing command '/usr/bin/git checkout .gitmodules' in directory '/
srv/deployment/graphoid/deploy'
[INFO    ] Executing command '/usr/bin/git config remote.origin.url' in director
y '/srv/deployment/graphoid/deploy'
[INFO    ] Executing command '/usr/bin/git submodule sync' in directory '/srv/de
ployment/graphoid/deploy'
[INFO    ] Executing command '/usr/bin/git submodule update --recursive --init'
in directory '/srv/deployment/graphoid/deploy'
local:
    ----------
    dependencies:
    repo:
        graphoid/deploy
    status:
        0
    tag:
        graphoid/deploy-sync-20151127-170244
bd808@deployment-sca01:~$ cd /srv/deployment/graphoid/deploy/
bd808@deployment-sca01:/srv/deployment/graphoid/deploy$ git log
commit 54f25d526597d6093299c35cb00861e9adecc015
Author: Yuri Astrakhan <yurik@wikimedia.org>
Date:   Fri Nov 27 19:33:59 2015 +0300

    Update graphoid to 45d31c1

    List of changes:
    20db530 Added deploy.remote option docs and bumped service-runner version
    a1a347c Added docs for node version setting
    06aceb0 Fixed language in docs
    06959fd Better wording of docs for deploy.remote option
    15ee1e8 Docs: Deploy: Ensure latest source dependencies
    d30a2b6 CSP Headers: Allow the configuration of sending the security headers
    45d31c1 Use graph api and new Vega api
    xxxxxxx Update node module dependencies

    Change-Id: I80efec8a6c2d2a472ccad38bdadd67dd3345939e

I tried a no-op deploy from deployment-bastion and it still did not work according to git deploy on deployment-bastion but I can see the new tag on both of the sca0[12] hosts. This makes me think the problem is that the returners on sca0[12] are failing to write back to the redis instance on deployment-bastion.

I tried testing with the test/testrepos Trebuchet target and got similar results. Trebuchet reported that all fetches and syncs failed to the minions deployment-db1 and deployment-mathoid, but manually checking /srv/deployment/test/testrepo on those servers shows that the new deploy tag (test/testrepo-sync-20151127-192854) is present and synced.

Deployment-bastion seems to have ferm enabled but I can also see that it has accept rules for redis:

$ sudo iptables -L -n | grep 6379
ACCEPT     tcp  --  10.0.0.0/8           0.0.0.0/0            tcp dpt:6379
ACCEPT     tcp  --  208.80.154.136       0.0.0.0/0            tcp dpt:6379

I know that there has been some Puppet changes related to redis in the last few days, so there may have been some little change that is making Trebuchet sad.

greg renamed this task from Graphoid is not syncing on beta cluster again ( to Graphoid is not syncing on beta cluster again.Nov 27 2015, 9:13 PM
greg set Security to None.

It works now, should I close it? @bd808 , is there something you are still tracking with this?

mobrovac claimed this task.

This was a bug introduced because of a new timeout feature of trebuchet. Syncing didn't work in prod either. It was fixed yesterday, so closing it.