A syntax error in one of the prometheus query for monitoring was hitting the All exclamation marks in the query parameter must be escaped e.g. \! check and making Puppet fail on the alert hosts.
The issue has been unnoticed for ~19 hours and got noticed only because an alert for a decommissioned host was triggered.
Due to the special nature of the alert hosts we could consider making them an exception of the aggregated puppet check and last puppet run alerts so that they would alert and be noticed after a shorter amount of time.
I think that both puppet failure or puppet disabled on the alert hosts for more than a couple of hours should be considered a problem. Thoughts?