There is Zero monitoring for Nodepool. On top of my head a bare minimum would be:
[ ] process present on labnodepool1001.eqiad.wmnet (though puppet or systemd restart it) - https://gerrit.wikimedia.org/r/#/c/244171/
[ ] CPU / load / mem usage
[ ] viability of MySQL sessions (one per booted instance, does not recover properly on networking flap)
[ ] graphs of the pool (needs to restrict the metrics Nodepool send, it is too spammy)
[ ] alert when pool is exhausted
[ ] reachability of OpenStack API as the nodepool user
[ ] detect weird behavior such as snapshot/instances creation failures
[ ] send Nodepool log to LogStash
[ ] review monitoring contact list
[ ] paging?
[ ] first level diagnostics procedures
[ ] number of Nodepool managed slaves ready to accept jobs and # of offline nodes
[ ] [[http://phabricator.wikimedia.org/T2001 | (bug 1)]]