Every now and then when doing maintenance tasks (e.g. yesterday's facter upgrades) I find myself stuck in a heap of sticky, broken labs VMs. I think that we can monitor for a few simple issues (specifically disk space and puppet failures) and intervene before these problems become too serious.
I don't want these things to alert, or even nag in an IRC channel. But I do want a big status board that shows ALL the vms and how they're doing. That way when I have some free time (or better yet when the clinic duty person has time) we can go through and nag, delete files, and otherwise clean up.
I'm doing this anyway, better to do it when it's not an emergency.
I have no real opinion about what the right tool is for this.