Page MenuHomePhabricator

Fix up icinga puppetization
Closed, ResolvedPublic

Description

The last time we had to reinstall neon (icinga server) back in late July 2014, it became clear that it wasn't fully puppetized. The data on the post-puppet fixups and such there was only recorded in an email thread, which I'm pasting here now for posterity:

Pre-existing Puppetization Issues:
============================
This docs some stuff I had to figure out and apply manually on the host to get puppet happy and various icinga bits functional again.  Most of it probably needs puppetization:

a2enmod authnz_ldap
a2enmod rewrite
rm -f /var/lib/nagios/rw/nagios.cmd
#   root@neon:/var# find . -user nagios
#   ./cache/icinga/objects.cache
#   ./log/icinga
#   ./log/icinga/icinga.log
#   ./log/icinga/archives
#   ./lib/nagios
cd /var; find . -user nagios|xargs chown -R icinga
# From modules/ishmael/manifests/init.pp comments:
cd /srv ; git clone https://github.com/asher/ishmael.git
cd ishmael/; git clone https://github.com/asher/ishmael.git sample
# For some nrpe checks to run:
apt-get install libssl0.9.8
# To get irc bot to read logs:
chmod o+r /var/log/icinga

This one I fixed in puppet already since it was relatively simple (tcpircbot group):
https://gerrit.wikimedia.org/r/#/c/150598/

Also, some new unpuppetization was added in the process:

1) I moved some of the fast, tiny write traffic from icinga onto a small 128MB tmpfs filesystem.  This offloads a bunch of disk i/o.

The gerrit changes to icinga.cfg are here:
https://gerrit.wikimedia.org/r/150695
https://gerrit.wikimedia.org/r/150702

The mountpoint /var/icinga-tmpfs and the fstab entry for it aren't puppetized yet.  The fstab entry is currently:

tmpfs	/var/icinga-tmpfs	tmpfs	size=128m,uid=icinga,gid=icinga,mode=755	0	0

(In retrospect, that may be a pretty silly mountpoint.  Maybe put it elsewhere while puppetizing?)

2) I manually raised the default 1GB swap parition from the new install's setup to 8GB (via lvm) because we were running out of swap and oomkilling.

Event Timeline

BBlack raised the priority of this task from to Needs Triage.
BBlack updated the task description. (Show Details)
BBlack subscribed.

Change 235008 had a related patch set uploaded (by John F. Lewis):
icinga: puppetise apache mods

https://gerrit.wikimedia.org/r/235008

Change 235008 merged by Dzahn:
icinga: puppetise apache mods

https://gerrit.wikimedia.org/r/235008

Change 235017 had a related patch set uploaded (by Dzahn):
icinga: libssl0.9.8 for NRPE checks to run

https://gerrit.wikimedia.org/r/235017

re: the 'ishmael' steps, also see: T109777 we might just decom it. and it's not a thing that should be in the icinga module. it was only related because it's running on node neon.

Change 235017 merged by BBlack:
icinga: libssl0.9.8 for NRPE checks to run

https://gerrit.wikimedia.org/r/235017

@Dzahn @BBlack Only the tmpfs is left here, right? I am happy to help!

@jcrespo it seems and the manual swap increase form the default 1GB set in the lvm partman recipe to 8GB.

jcrespo triaged this task as Medium priority.Sep 7 2015, 5:45 PM
jcrespo removed a project: Patch-For-Review.
jcrespo set Security to None.

the ishmael part of this is gone now, T109777

Change 256467 had a related patch set uploaded (by Dzahn):
icinga cleanup: move gsb monitoring to ./monitor/

https://gerrit.wikimedia.org/r/256467

Change 256508 had a related patch set uploaded (by Dzahn):
icinga: remove user from dialout group

https://gerrit.wikimedia.org/r/256508

Change 256509 had a related patch set uploaded (by Dzahn):
icinga/labsnfs: move monitoring groups to labsnfs

https://gerrit.wikimedia.org/r/256509

Change 256508 merged by Dzahn:
icinga: remove user from dialout group

https://gerrit.wikimedia.org/r/256508

Change 256467 merged by Dzahn:
icinga cleanup: move gsb monitoring to ./monitor/

https://gerrit.wikimedia.org/r/256467

Change 256509 merged by Dzahn:
icinga/labsnfs: move monitoring groups to labsnfs

https://gerrit.wikimedia.org/r/256509

Change 262417 had a related patch set uploaded (by Dzahn):
icinga: add USER4 resource for /usr/local/lib/

https://gerrit.wikimedia.org/r/262417

Change 262417 merged by Dzahn:
icinga: add USER4 macro for /usr/local/lib/

https://gerrit.wikimedia.org/r/262417

Change 316815 had a related patch set uploaded (by Alexandros Kosiaris):
icinga: Fix permissions of a few directories

https://gerrit.wikimedia.org/r/316815

Change 316815 merged by Alexandros Kosiaris:
icinga: Fix permissions of a few directories

https://gerrit.wikimedia.org/r/316815

Change 318436 had a related patch set uploaded (by Dzahn):
icinga: move files/icinga/ into module

https://gerrit.wikimedia.org/r/318436

Change 318436 merged by Alexandros Kosiaris:
icinga: move files/icinga/ into module

https://gerrit.wikimedia.org/r/318436

akosiaris claimed this task.
akosiaris subscribed.

I am going to resolve this. Most of these issues have been fixed (along with others) either in https://gerrit.wikimedia.org/r/#/q/project:operations/puppet+branch:production+topic:icinga_new_hosts or some other commit