Page MenuHomePhabricator

Scap sync-file failing for deploy1001.eqiad.wmnet
Closed, ResolvedPublic

Description

When using scap sync-file, I see 1 mw host that fails to get synced because of a permissions issue:

$ scap sync-file -v wmf-config/InitialiseSettings.php 'Switch a bulk of low-traffic jobs to EventBus for testwikis, file 1/2 (retry #2)'
12:39:26 Started sync-apaches
12:39:26 Using key: /etc/keyholder.d/mwdeploy.pub
12:39:31 ['/usr/bin/scap', 'pull', '--no-update-l10n', '--include', 'wmf-config', '--include', 'wmf-config/InitialiseSettings.php', '--verbose', 'mw1284.eqiad.wmnet', 'mw1319.eqiad.wmnet', 'mw1280.eqiad.wmnet', 'mw2290.codfw.wmnet', 'mw2215.codfw.wmnet', 'mw2254.codfw.wmnet', 'mw2187.codfw.wmnet', 'mw1250.eqiad.wmnet', 'mw1313.eqiad.wmnet'] on deploy1001.eqiad.wmnet returned [255]: Permission denied (publickey,keyboard-interactive).

sync-apaches: 100% (ok: 269; fail: 1; left: 0)                                  
12:39:31 1 apaches had sync errors
12:39:31 Finished sync-apaches (duration: 00m 05s)

Event Timeline

mobrovac triaged this task as Unbreak Now! priority.Apr 11 2018, 12:43 PM
jcrespo renamed this task from Scap sync-file failing for 9 hosts to Scap sync-file failing for deploy1001.eqiad.wmnet.Apr 11 2018, 12:47 PM

Apparently deploy1001 has been recently reimaged:

20:30 mutante: deploy1001 - reinstalled with stretch - re-adding to puppet (T175288)
20:30 mutante: deploy1001 - reinstalled with jessie - re-adding to puppet (T175288)

However, it doesn't seem like it has a role associated with it at this time (it's not a deployment server otherwise I would have been able to log in). As a consequence, the failing mw hosts reject connections coming from it, as its key has changed.

mobrovac lowered the priority of this task from Unbreak Now! to High.Apr 11 2018, 1:06 PM
mobrovac removed a project: Services (blocked).

It's fialing only on deploy1001, so lowering the priority.

Change 425561 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] remove deploy1001 from scap hosts

https://gerrit.wikimedia.org/r/425561

Change 425561 merged by Dzahn:
[operations/puppet@production] remove deploy1001 from scap hosts

https://gerrit.wikimedia.org/r/425561

deploy1001 has been removed from scap hosts and puppet ran on tin. This should have fixed the immediate scap issue.