Follow-up from T294915: CodeSearch's "deployed" profile hasn't yet worked out that WikiLambda is now branched for production, we need monitoring and alerting if the codesearch-write-config systemd timer fails.
Description
Description
Related Objects
Related Objects
Event Timeline
Comment Actions
10:34:20 <legoktm> majavah: also, is it possible to have alerting if a systemd unit fails, like we do in prod? 10:35:01 <legoktm> specifically the codesearch-write-config unit (https://phabricator.wikimedia.org/T294915) but even if it was all units that would be fine too 10:35:37 <legoktm> if not I'll have the codesearch web app export the systemd status as a bool in our current metrics endpoint 10:36:27 <legoktm> T294958 10:36:28 <+stashbot> T294958: Add monitoring+alerting for codesearch-write-config - https://phabricator.wikimedia.org/T294958 10:50:11 <majavah> legoktm: prometheus-node-exporter collects systemd stats, so that should be set now 10:50:36 <legoktm> majavah: for all units or just that specific one? 10:50:50 <majavah> just that specific one 10:51:19 <legoktm> would it be excessive/problematic/difficult to do it for all? 10:51:46 <majavah> T287309 10:51:46 <+stashbot> T287309: Some systemd services appear to be broken on all VMs - https://phabricator.wikimedia.org/T287309 10:52:10 <legoktm> ack, fair enough 10:52:13 <legoktm> thanks :)