As discussed on IRC, while investigating T403615 I spotted the following error on es* hosts that are not alarming (some of which are running in production).
Sep 01 10:30:50 es2026 nrpe2nodexp-ferm_active[2424160]: --- Logging error --- Sep 01 10:30:50 es2026 nrpe2nodexp-ferm_active[2424160]: Traceback (most recent call last): Sep 01 10:30:50 es2026 nrpe2nodexp-ferm_active[2424160]: File "/usr/lib/python3.11/logging/__init__.py", line 1110, in emit Sep 01 10:30:50 es2026 nrpe2nodexp-ferm_active[2424160]: msg = self.format(record) Sep 01 10:30:50 es2026 nrpe2nodexp-ferm_active[2424160]: ^^^^^^^^^^^^^^^^^^^ Sep 01 10:30:50 es2026 nrpe2nodexp-ferm_active[2424160]: File "/usr/lib/python3.11/logging/__init__.py", line 953, in format Sep 01 10:30:50 es2026 nrpe2nodexp-ferm_active[2424160]: return fmt.format(record) Sep 01 10:30:50 es2026 nrpe2nodexp-ferm_active[2424160]: ^^^^^^^^^^^^^^^^^^ Sep 01 10:30:50 es2026 nrpe2nodexp-ferm_active[2424160]: File "/usr/local/bin/nrpe2nodexp", line 169, in format Sep 01 10:30:50 es2026 nrpe2nodexp-ferm_active[2424160]: kind, outcome, etype = self.statuscode_to_kind_outcome_type(record.returncode) Sep 01 10:30:50 es2026 nrpe2nodexp-ferm_active[2424160]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Sep 01 10:30:50 es2026 nrpe2nodexp-ferm_active[2424160]: File "/usr/local/bin/nrpe2nodexp", line 140, in statuscode_to_kind_outcome_type Sep 01 10:30:50 es2026 nrpe2nodexp-ferm_active[2424160]: etype = "change" if self.detect_status_change(returncode) else "info" Sep 01 10:30:50 es2026 nrpe2nodexp-ferm_active[2424160]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Sep 01 10:30:50 es2026 nrpe2nodexp-ferm_active[2424160]: File "/usr/local/bin/nrpe2nodexp", line 96, in detect_status_change Sep 01 10:30:50 es2026 nrpe2nodexp-ferm_active[2424160]: if not (p.exists() and p.is_file() and os.access(p, os.R_OK)): Sep 01 10:30:50 es2026 nrpe2nodexp-ferm_active[2424160]: ^^^^^^^^^^ Sep 01 10:30:50 es2026 nrpe2nodexp-ferm_active[2424160]: File "/usr/lib/python3.11/pathlib.py", line 1236, in exists Sep 01 10:30:50 es2026 nrpe2nodexp-ferm_active[2424160]: self.stat() Sep 01 10:30:50 es2026 nrpe2nodexp-ferm_active[2424160]: File "/usr/lib/python3.11/pathlib.py", line 1014, in stat Sep 01 10:30:50 es2026 nrpe2nodexp-ferm_active[2424160]: return os.stat(self, follow_symlinks=follow_symlinks) Sep 01 10:30:50 es2026 nrpe2nodexp-ferm_active[2424160]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Sep 01 10:30:50 es2026 nrpe2nodexp-ferm_active[2424160]: PermissionError: [Errno 13] Permission denied: '/var/lib/prometheus/node.d/check_ferm_active.prom'
See for example ssh es2026.codfw.wmnet sudo journalctl -u nrpe2nodexp-ferm_active