Page MenuHomePhabricator

Puppet failure on integration-agent-docker-1039.integration.eqiad1.wikimedia.cloud
Closed, ResolvedPublic

Description

From Cloud VPS alert email:

Puppet is having issues on the "integration-agent-docker-1039.integration.eqiad1.wikimedia.cloud (172.16.2.172)" instance in project
integration in Wikimedia Cloud VPS.

ERR: Could not find user jenkins-deploy
ERR: change from 2947 to 'jenkins-deploy' failed: Could not find user jenkins-deploy
ERR: Could not find group wikidev
ERR: change from 500 to 'wikidev' failed: Could not find group wikidev
version:
  config: '(b5d442d1f8) JMeybohm - service_proxy: Set SNI and Host header for ingress

Failed resources

  • Exec[jenkins user docker membership]
  • File[/srv/jenkins]
  • File[/srv/jenkins-workspace]
  • File[/srv/home/jenkins-deploy]
log
NOTICE: The LDAP client stack for this host is: sssd/sudo
NOTICE: defined 'message' as 'The LDAP client stack for this host is: sssd/sudo'
NOTICE: usermod: user 'jenkins-deploy' does not exist
ERR: '/usr/sbin/usermod -aG docker 'jenkins-deploy'' returned 6 instead of one of [0]
ERR: change from 'notrun' to ['0'] failed: '/usr/sbin/usermod -aG docker 'jenkins-deploy'' returned 6 instead of one of [0] (corrective)
ERR: Could not find user jenkins-deploy
ERR: change from 2947 to 'jenkins-deploy' failed: Could not find user jenkins-deploy
ERR: Could not find group wikidev
ERR: change from 500 to 'wikidev' failed: Could not find group wikidev
ERR: Could not find user jenkins-deploy
ERR: change from 2947 to 'jenkins-deploy' failed: Could not find user jenkins-deploy
ERR: Could not find group wikidev
ERR: change from 500 to 'wikidev' failed: Could not find group wikidev
NOTICE: Dependency File[/srv/jenkins] has failures: true
WARNING: Skipping because of failed dependencies
WARNING: Skipping because of failed dependencies
ERR: Could not find user jenkins-deploy
ERR: change from 2947 to 'jenkins-deploy' failed: Could not find user jenkins-deploy
ERR: Could not find group wikidev
ERR: change from 500 to 'wikidev' failed: Could not find group wikidev
NOTICE: Dependency File[/srv/home/jenkins-deploy] has failures: true
WARNING: Skipping because of failed dependencies
NOTICE: Applied catalog in 7.04 seconds

Event Timeline

The CI agent is up and running at https://integration.wikimedia.org/ci/computer/integration%2Dagent%2Ddocker%2D1039/ but I can't ssh into it Permission denied (publickey)

Mentioned in SAL (#wikimedia-releng) [2022-07-07T12:22:18Z] <hashar> integration: rebooting integration-agent-docker-1039 T312534

hashar claimed this task.

The command failing is:

/usr/sbin/usermod -aG docker 'jenkins-deploy'

The first occurence in puppet log is on July 6th at 17:35

I can't find anything in the logs, so that will remain a mystery.