Change Details

== Intro == The `instance` label is automatically added by Prometheus and (typically) it is in the form of `hostname:port` from which metrics have been fetched from. For example node-exporter is on port 9100 thus all of its metrics have `instance=HOST:9100`. Icinga compatibility alerts (prefixed with `Icinga/`) don't have the port (though we could change that easily) == Problem statement == From the alerts dashboard we'd like to allow for filtering/grouping by host (e.g. to show all active alerts of a single host). In the dashboard UI clicking a label adds said label to the current filters; therefore clicking a `host:port` label will show all alerts for that specific port and not the host. Showing all alerts per-host would mean changing the filter from `instance=HOST:PORT` to `instance=~^HOST:.*` (for example). == Solutions == Below a list of possible solutions and the tradeoffs involved: ==== 1. Strip port from instance at ingestion time ==== In this case we'd have `instance` to be without a port at ingestion time (i.e. Prometheus stores metrics without the port). This solution is quite invasive (likely dashboards need to be adapted), we'd have 100% new metrics since the `instance` label changes, and having port in `instance` does have its use cases (e.g. when co-hosting multiple instances of the same software). ==== 2. String port from instance for outgoing alerts ==== We would strip the port from `instance` only when sending alerts to alertmanager. The solution is not invasive and allows for the easy grouping mentioned above. Downsides include the fact that the alert's labels don't reflect the underlying expression labels anymore, leading to potential confusion. Another point of confusion might be when metrics with different ports (but same host) are alerting (e.g. search has multiple ES instances on the same hw) ==== 3. Add a new label `host` based on `instance` to alerts ==== We would add a new label `host` to alerts (adding it to the metrics is possible but we'd incur in metrics churn described above). The solution has the advantage of a brand new label (i.e. no confusion), however the hostname would be shown twice, once in `instance` and once in `host` ==== 4. Keep port in `instance` ==== In this case we strive for consistency between alerts and their underlying metrics, and would add a (bogus) port to Icinga / LibreNMS alerts. While the grouping is achieved via a different filter (i.e. non-default from the dashboard UI) this is the least invasive solution and the most "consistent" one. For "quality of life" we could ask the dashboard UI (Karma) upstream if they are willing to implement different filters on click; this way we could still have one-click filtering/grouping to select all alerts for a given host.

For easier filtering/grouping,== Intro == The `instance` label is automatically added by Prometheus and (typically) it is in the form of `hostname:port` from which metrics have been fetched from. For example node-exporter is on port 9100 thus all of its metrics have `instance=HOST:9100`. Icinga compatibility alerts (prefixed with `Icinga/`) don't have the port (though we could change that easily) == Problem statement == From the alerts dashboard we'd like to allow for filtering/grouping by host (e.g. to show all active alerts of a single host). In the dashboard UI clicking a label adds said label to the current filters; therefore clicking a `host:port` label will show all alerts for that specific port and not the host. Showing all alerts per-host would mean changing the filter from `instance=HOST:PORT` to `instance=~^HOST:.*` (for example). == Solutions == Below a list of possible solutions and the tradeoffs involved: ==== 1. Strip port from instance at ingestion time ==== In this case we'd have `instance` to be without a port at ingestion time (i.e. Prometheus stores metrics without the port). This solution is quite invasive (likely dashboards need to be adapted), we'd have 100% new metrics since the `instance` label changes, and having port in `instance` does have its use cases (e.g. when co-hosting multiple instances of the same software). ==== 2. String port from instance for outgoing alerts ==== We would strip the port from `instance` only when sending alerts to alertmanager. The solution is not invasive and allows for the easy grouping mentioned above. Downsides include the fact that the alert's labels don't reflect the underlying expression labels anymore, leading to potential confusion. Another point of confusion might be when metrics with different ports (but same host) are alerting (e.g. search has multiple ES instances on the same hw) ==== 3. Add a new label `host` based on `instance` to alerts ==== We would add a new label `host` to alerts (adding it to the metrics is possible but we'd incur in metrics churn described above). The solution has the advantage of a brand new label (i.e. no confusion), however the hostname would be shown twice, once in `instance` and once in `host` ==== 4. Keep port in `instance` ==== In this case we strive for consistency between alerts and their underlying metrics, and would add a (bogus) port to Icinga / LibreNMS alerts. While the grouping is achieved via a different filter (i.e. non-default from the dashboard UI) this is the least invasive solution and the most "consistent" one. when prometheus sends out alerts the `instance` label will have <hostname>:<port>.For "quality of life" we could ask the dashboard UI (Karma) upstream if they are willing to implement different filters on click; For easier grouping/filtering/silencing I think we should strip the port part (or possibly add another label `host` without port) cc @Volans following the IRC chatthis way we could still have one-click filtering/grouping to select all alerts for a given host.