Page MenuHomePhabricator

Can't access thanos-fe1001.mgmt
Closed, ResolvedPublic

Description

I'm reimaging this host as part of T280257: Thanos compaction stopped due to local filesystem space shortage however wmf-auto-reimage-host fails to contact the host on IPMI, and I can confirm that trying to ssh in myself I get a permission denied error. My hunch is that the management password isn't correct/updated, but I'm not sure! Could you check everything is set up for remote mgmt access?

# wmf-auto-reimage-host -p T280257 --conftool thanos-fe1001.eqiad.wmnet
07:47:22 | thanos-fe1001.eqiad.wmnet | REIMAGE START | To monitor the full log and cumin output:
sudo tail -F /var/log/wmf-auto-reimage/202104200747_filippo_12149_thanos-fe1001_eqiad_wmnet.log
sudo tail -F /var/log/wmf-auto-reimage/202104200747_filippo_12149_thanos-fe1001_eqiad_wmnet_cumin.out
IPMI Password:
Error: Unable to establish IPMI v2 / RMCP+ session
07:47:28 | thanos-fe1001.eqiad.wmnet | Unable to run wmf-auto-reimage-host: Remote IPMI failed for mgmt 'thanos-fe1001.mgmt.eqiad.wmnet': Command '['ipmitool', '-I', 'lanplus', '-H', 'thanos-fe1001.mgmt.eqiad.wmnet', '-U', 'root', '-E', 'chassis', 'power', 'status']' returned non-zero exit status 1.
07:47:28 | thanos-fe1001.eqiad.wmnet | REIMAGE END | retcode=2