Page MenuHomePhabricator

shinken: issue with shinkengen
Closed, ResolvedPublic

Description

I discovered this today in the shinken server:

root@shinken-02:~# puppet agent -tv
Info: Using configured environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts
Info: Caching catalog for shinken-02.shinken.eqiad.wmflabs
Notice: /Stage[main]/Base::Environment/Tidy[/var/tmp/core]: Tidying 0 files
Info: Applying configuration version '1566117957'
Notice: The LDAP client stack for this host is: classic/sudoldap
Notice: /Stage[main]/Profile::Ldap::Client::Labs/Notify[LDAP client stack]/message: defined 'message' as 'The LDAP client stack for this host is: classic/sudoldap'
Notice: /Stage[main]/Shinken::Shinkengen/Exec[/usr/local/bin/shinkengen]/returns: Traceback (most recent call last):
Notice: /Stage[main]/Shinken::Shinkengen/Exec[/usr/local/bin/shinkengen]/returns:   File "/usr/local/bin/shinkengen", line 134, in <module>
Notice: /Stage[main]/Shinken::Shinkengen/Exec[/usr/local/bin/shinkengen]/returns:     with open(hosts_config_path, 'w') as hostsfile:
Notice: /Stage[main]/Shinken::Shinkengen/Exec[/usr/local/bin/shinkengen]/returns: IOError: [Errno 13] Permission denied: '/etc/shinken/generated/deployment-prep.cfg'
Error: /usr/local/bin/shinkengen returned 1 instead of one of [0]
Error: /Stage[main]/Shinken::Shinkengen/Exec[/usr/local/bin/shinkengen]/returns: change from notrun to 0 failed: /usr/local/bin/shinkengen returned 1 instead of one of [0]
Notice: /Stage[main]/Shinken/Service[shinken]: Dependency Exec[/usr/local/bin/shinkengen] has failures: true
Warning: /Stage[main]/Shinken/Service[shinken]: Skipping because of failed dependencies
Info: Stage[main]: Unscheduling all events on Stage[main]
Notice: Applied catalog in 50.70 seconds

I didn't do any further investigation.

Event Timeline

Not quite sure how it got deleted but I re-created /etc/shinken/generated/deployment-prep.cfg, chmod 644 and chown shinken. (then decided to make the directory owned by it for a moment to create other files) Now shinken is upset instead

The shinken service itself also won't start up properly:

Error: Could not start Service[shinken]: Execution of '/usr/sbin/service shinken start' returned 1: Job for shinken.service failed because the control process exited with error code.
See "systemctl status shinken.service" and "journalctl -xe" for details.
Error: /Stage[main]/Shinken/Service[shinken]/ensure: change from stopped to running failed: Could not start Service[shinken]: Execution of '/usr/sbin/service shinken start' returned 1: Job for shinken.service failed because the control process exited with error code.
See "systemctl status shinken.service" and "journalctl -xe" for details.
-- Unit shinken.service has begun starting up.
Aug 18 09:45:11 shinken-02 shinken[7391]: Starting scheduler:
Aug 18 09:45:11 shinken-02 shinken[7391]: Already running ... (warning).
Aug 18 09:45:11 shinken-02 shinken[7391]: Starting poller:
Aug 18 09:45:11 shinken-02 shinken[7391]: Already running ... (warning).
Aug 18 09:45:11 shinken-02 shinken[7391]: Starting reactionner:
Aug 18 09:45:11 shinken-02 shinken[7391]: Already running ... (warning).
Aug 18 09:45:11 shinken-02 shinken[7391]: Starting broker:
Aug 18 09:45:11 shinken-02 shinken[7391]: Already running ... (warning).
Aug 18 09:45:11 shinken-02 shinken[7391]: Starting receiver:
Aug 18 09:45:11 shinken-02 shinken[7391]: Already running ... (warning).
Aug 18 09:45:11 shinken-02 shinken[7391]: Starting arbiter:
Aug 18 09:45:12 shinken-02 shinken[7391]: FAILED: Configuration is incorrect, sorry, I bail out (full output is in /tmp/bad_start_for_arbiter)
Aug 18 09:45:12 shinken-02 shinken[7391]:  failed!

I've been digging in the config files but haven't found much yet.

From /tmp/bad_start_for_arbiter:

[1566124726] Error :   [host::cloudinfra-internal-puppetmaster01] The contact group 'cloudinfra' defined on the host 'cloudinfra-internal-puppetmaster01' do not exist
[1566124726] Error :   [items] In cloudinfra-internal-puppetmaster01 is incorrect ; from /etc/shinken/generated/cloudinfra.cfg:1
[1566124726] Error :   [host::cloudinfra-db02] The contact group 'cloudinfra' defined on the host 'cloudinfra-db02' do not exist
[1566124726] Error :   [items] In cloudinfra-db02 is incorrect ; from /etc/shinken/generated/cloudinfra.cfg:9
[1566124726] Error :   [host::cloudinfra-db01] The contact group 'cloudinfra' defined on the host 'cloudinfra-db01' do not exist
[1566124726] Error :   [items] In cloudinfra-db01 is incorrect ; from /etc/shinken/generated/cloudinfra.cfg:17
[1566124726] Error :   [host::cloud-puppetmaster-02] The contact group 'cloudinfra' defined on the host 'cloud-puppetmaster-02' do not exist
[1566124726] Error :   [items] In cloud-puppetmaster-02 is incorrect ; from /etc/shinken/generated/cloudinfra.cfg:25
[1566124726] Error :   [host::cloud-puppetmaster-01] The contact group 'cloudinfra' defined on the host 'cloud-puppetmaster-01' do not exist
[1566124726] Error :   [items] In cloud-puppetmaster-01 is incorrect ; from /etc/shinken/generated/cloudinfra.cfg:33
[1566124726] Error :   [host::ntp-02] The contact group 'cloudinfra' defined on the host 'ntp-02' do not exist
[1566124726] Error :   [items] In ntp-02 is incorrect ; from /etc/shinken/generated/cloudinfra.cfg:41
[1566124726] Error :   [host::ntp-01] The contact group 'cloudinfra' defined on the host 'ntp-01' do not exist
[1566124726] Error :   [items] In ntp-01 is incorrect ; from /etc/shinken/generated/cloudinfra.cfg:49
[1566124726] Error :   [host::mx-out01] The contact group 'cloudinfra' defined on the host 'mx-out01' do not exist
[1566124726] Error :   [items] In mx-out01 is incorrect ; from /etc/shinken/generated/cloudinfra.cfg:57
[1566124726] Error :   [host::mx-out02] The contact group 'cloudinfra' defined on the host 'mx-out02' do not exist
[1566124726] Error :   [items] In mx-out02 is incorrect ; from /etc/shinken/generated/cloudinfra.cfg:65
[1566124726] Error :   	hosts conf incorrect!!

Think maybe we added the cloudinfra project without creating it a contact group

Change 530765 had a related patch set uploaded (by Alex Monk; owner: Alex Monk):
[operations/puppet@production] Add missing cloudinfra contact group

https://gerrit.wikimedia.org/r/530765

puppet on shinken-02 runs with this ^

Change 530765 merged by Andrew Bogott:
[operations/puppet@production] Add missing cloudinfra contact group

https://gerrit.wikimedia.org/r/530765