Page MenuHomePhabricator

git-deploy broken on beta cluster
Closed, ResolvedPublic

Description

Following deploy instructions at https://wikitech.wikimedia.org/wiki/OCG#Deploying_the_latest_version_of_OCG

$ git deploy sync
Repo: ocg/ocg
Tag: ocg/ocg-sync-20160125-214906

0/2 minions completed fetch
Continue? ([d]etailed/[C]oncise report,[y]es,[n]o,[r]etry): d
Repo: ocg/ocg
Tag: ocg/ocg-sync-20160125-214906

0/2 minions completed fetch

Details:

deployment-pdf01.deployment-prep.eqiad.wmflabs: 
	fetch status: 0 [started: 72 mins ago, last-return: 72 mins ago]
deployment-pdf02.deployment-prep.eqiad.wmflabs: 
	fetch status: 0 [started: 72 mins ago, last-return: 72 mins ago]
Continue? ([d]etailed/[C]oncise report,[y]es,[n]o,[r]etry):

Logging in to deployment-pdf01 reveals that puppet hasn't been run all weekend:

$ ssh deployment-pdf01.deployment-prep.eqiad.wmflabs
Linux deployment-pdf01 3.13.0-76-generic #120-Ubuntu SMP Mon Jan 18 15:59:10 UTC 2016 x86_64
Ubuntu 14.04.3 LTS
deployment-pdf01 is a offline content generator for MediaWiki Collection extension (ocg)
deployment-pdf01 is a Puppet client of deployment-puppetmaster.eqiad.wmflabs (puppetclient)
The last Puppet run was at Fri Jan 22 21:55:09 UTC 2016 (4331 minutes ago).
Last login: Mon Jan 25 21:47:53 2016 from bastion-01.bastion.eqiad.wmflabs
cscott@deployment-pdf01:~$ exit

Tyler's been investigating; he can add more detail. There's apparently something wrong with redis on deployment-bastion as well.

Event Timeline

cscott raised the priority of this task from to Needs Triage.
cscott updated the task description. (Show Details)
cscott added subscribers: cscott, mobrovac, thcipriani.
thcipriani renamed this task from git-deploy broken on OCG labs machines to git-deploy broken on beta cluster.Jan 25 2016, 10:30 PM
thcipriani set Security to None.
thcipriani added subscribers: greg, hashar.

Seems that the redis instance on deployment-bastion has been turned into a readonly slave. This is the redis instance that trebuchet uses as a returner.

Trying to run deploy.fetch via salt locally on the instance gives a big nasty error.

thcipriani@deployment-pdf01:~$ sudo salt-call deploy.fetch 'ocg/ocg'
[ERROR   ] An un-handled exception was caught by salt's global exception handler:
ResponseError: READONLY You can't write against a read only slave.
Traceback (most recent call last):
  File "/usr/bin/salt-call", line 11, in <module>
    salt_call()
  File "/usr/lib/python2.7/dist-packages/salt/scripts.py", line 123, in salt_call
    client.run()
  File "/usr/lib/python2.7/dist-packages/salt/cli/__init__.py", line 422, in run
    caller.run()
  File "/usr/lib/python2.7/dist-packages/salt/cli/caller.py", line 227, in run
    ret = self.call()
  File "/usr/lib/python2.7/dist-packages/salt/cli/caller.py", line 129, in call
    ret['return'] = func(*args, **kwargs)
  File "/var/cache/salt/minion/extmods/modules/deploy.py", line 498, in fetch
    _check_in('deploy.fetch', repo)
  File "/var/cache/salt/minion/extmods/modules/deploy.py", line 48, in _check_in
    serv.sadd('deploy:repos', repo)
  File "/usr/lib/python2.7/dist-packages/redis/client.py", line 937, in sadd
    return self.execute_command('SADD', name, *values)
  File "/usr/lib/python2.7/dist-packages/redis/client.py", line 361, in execute_command
    return self.parse_response(connection, command_name, **options)
  File "/usr/lib/python2.7/dist-packages/redis/client.py", line 371, in parse_response
    response = connection.read_response()
  File "/usr/lib/python2.7/dist-packages/redis/connection.py", line 311, in read_response
    raise response
ResponseError: READONLY You can't write against a read only slave.
Traceback (most recent call last):
  File "/usr/bin/salt-call", line 11, in <module>
    salt_call()
  File "/usr/lib/python2.7/dist-packages/salt/scripts.py", line 123, in salt_call
    client.run()
  File "/usr/lib/python2.7/dist-packages/salt/cli/__init__.py", line 422, in run
    caller.run()
  File "/usr/lib/python2.7/dist-packages/salt/cli/caller.py", line 227, in run
    ret = self.call()
  File "/usr/lib/python2.7/dist-packages/salt/cli/caller.py", line 129, in call
    ret['return'] = func(*args, **kwargs)
  File "/var/cache/salt/minion/extmods/modules/deploy.py", line 498, in fetch
    _check_in('deploy.fetch', repo)
  File "/var/cache/salt/minion/extmods/modules/deploy.py", line 48, in _check_in
    serv.sadd('deploy:repos', repo)
  File "/usr/lib/python2.7/dist-packages/redis/client.py", line 937, in sadd
    return self.execute_command('SADD', name, *values)
  File "/usr/lib/python2.7/dist-packages/redis/client.py", line 361, in execute_command
    return self.parse_response(connection, command_name, **options)
  File "/usr/lib/python2.7/dist-packages/redis/client.py", line 371, in parse_response
    response = connection.read_response()
  File "/usr/lib/python2.7/dist-packages/redis/connection.py", line 311, in read_response
    raise response
redis.exceptions.ResponseError: READONLY You can't write against a read only slave.

The error seems to make sense given this information:

thcipriani@deployment-bastion:~$ redis-cli config get slave-read-only                     
1) "slave-read-only"
2) "yes"

Not sure why it's a read-only redis slave just yet.

thcipriani claimed this task.

Looks like a new hiera variable deployment_server was introduced into the deployment::redis class that wasn't set in beta. If the $::fqdn didn't match the value of deployment_server the redis server was made a read-only slave.

Setting the deployment_server hiera value for deployment-prep and re-running puppet on deployment-bastion fixed the issue: https://wikitech.wikimedia.org/w/index.php?title=Hiera:Deployment-prep&diff=271322&oldid=262813