Page MenuHomePhabricator

puppet errors on contint servers related to helmfiles for push-notifications
Closed, ResolvedPublic

Description

Icinga showed warnings that there are puppet errors on contint1001 and contint2001 since about 2days and 8 hours from writing this.

They are related to deployment-charts, specifically helmfiles for push-notifications:

WARNING: Puppet has 3 failures. Last run 6 seconds ago with 3 failures. Failed resources (up to 3 shown): File[/srv/deployment-charts/helmfile.d/services/staging/push-notifications],File[/srv/deployment-charts/helmfile.d/services/eqiad/push-notifications],File[/srv/deployment-charts/helmfile.d/services/codfw/push-notifications]

The reason for that is Error: Could not find user mwdeploy

Error: /Stage[main]/Profile::Kubernetes::Deployment_server::Helmfile/File[/srv/deployment-charts/helmfile.d/services/eqiad/push-notifications]/owner: change from 'root' to 'mwdeploy' failed: Could not find user mwdeploy

See the puppet run on the servers for more details.

Should the user mwdeploy be added on contint servers or is it a different fix?

Event Timeline

The error message refers to push-notifications (which I have ZERO idea what it could be) but that leads me to suspect 6fb390b3ce18ff96a7b657c0ade6eccbfa9d0c17 by @jijiki and merged on July 22nd

--- a/hieradata/role/common/ci/master.yaml
+++ b/hieradata/role/common/ci/master.yaml
@@ -130,6 +130,12 @@ profile::kubernetes::deployment_server::services:
     username: recommendation-api
     group: wikidev
     mode: '0640'
+  push-notifications:
+    username: push-notifications
+    group: wikidev
+    namespace: push-notifications
+    mode: '0640'
+    owner: mwdeploy

There is no reason to have a mwdeploy user on the hosts having the role ci::master (contint1001 / contint2001).

The error message refers to push-notifications (which I have ZERO idea what it could be) but that leads me to suspect 6fb390b3ce18ff96a7b657c0ade6eccbfa9d0c17 by @jijiki and merged on July 22nd
There is no reason to have a mwdeploy user on the hosts having the role ci::master (contint1001 / contint2001).

Yes, T250491 , T256973 seem related. Already added one as parent task and left a comment on https://gerrit.wikimedia.org/r/c/operations/puppet/+/613104.

Change 617387 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/puppet@production] push-notifications: Remove from CI 2 wrongly added vars

https://gerrit.wikimedia.org/r/617387

Change 617387 merged by Alexandros Kosiaris:
[operations/puppet@production] push-notifications: Remove from CI 2 wrongly added vars

https://gerrit.wikimedia.org/r/617387

akosiaris claimed this task.
akosiaris added a subscriber: akosiaris.

I 've merged https://gerrit.wikimedia.org/r/617387, icinga no longer complains.