The Netbox reports Icinga NRPE checks are often timing out (10s timeout), and if their state is critical that means flapping between states (critical -> unknown -> critical) creating spam on the IRC channel. See  for the Icinga log of one of them (others are the same).
The puppetdb one is by far the most noisy alert on IRC in the last month according to  by itself, even more so if we sum all the Netbox report ones.
The check_netbox_report.py has a comment that says it has to get all reports objects each time also if checking only one of them, but it seems to me that a simple get() includes the result.failed property that we're calling in the Icinga check. Is there anything else missing?
IMHO there are some major improvements that could be done here:
- The code could be vastly simplified removing the support of checking multiple reports at once, that AFAIK we are not using and we always call it with a single report as parameter.
- Instead of getting all reports from the API, getting only the one we need to check should speed up quite a bit the API call
- We are now in a weird situation in which we have a script called check_netbox_report.py that is run both by Icinga via NRPE (as it should) and via systemd timers with the --run option to actually run the report. I think at this point those two independent actions should be split into two different scripts, having the NRPE check only checking the status and the systemd timers ones only running the report.