Investigate possibility of using logging cluster events as grafana annotations (https://grafana.com/docs/reference/annotations/)
Some ideas for possibly useful events:
- Puppet merges and runs
- Icinga alerts
- SAL entries
- Deploys
- Downtimes
Investigate possibility of using logging cluster events as grafana annotations (https://grafana.com/docs/reference/annotations/)
Some ideas for possibly useful events:
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | colewhite | T222826 Leverage Grafana annotations to show events in graphs | |||
Resolved | colewhite | T174172 unused grafana-dashboard indices on elasticsearch / logstash | |||
Resolved | akosiaris | T257226 Please create operations/debs/grafana-loki gerrit repository | |||
Resolved | colewhite | T257861 Pipe SAL entries into Logstash | |||
Open | None | T350825 Loki: add a channel(s) for git commits |
Loki looks like a feasible option to try given the resource constraints on the Grafana VM. It appears there is headroom on the host long as we keep events reasonably low-traffic.
Change 597317 had a related patch set uploaded (by Cwhite; owner: Cwhite):
[operations/docker-images/production-images@master] add loki 1.4.1
For what is worth and for kubernetes deploys specifically, we have in grafana an annotation that is working most of the times, but can easily fail us. It's a simple
resets((sum(service_runner_request_duration_seconds_count{service="$service"}))[1m:]) > bool 0
It has at least 2 drawbacks I 've identified in the short time I 've been using it:
An approach I have been thinking about was, since helmfile supports hooks, to have a hook emit a statsd line to a local prometheus-statsd-exporter and then scrape that from prometheus and use it as an annotation. The fact that it is statsd is just an implementation detail of course, it's just the easy to try and do and already tried thing. Since we can run arbitrary commands in that hook [1] (albeit not always having in the environment all the info we would like) we can use other methods of sending that information
Downside of using helmfile hooks would be that we catch the trigger, not the actual event. So deploys triggered by rollbacks for example would not be recognized. We should maybe ask the kubernetes API, it should know best. :-)
There is a "kube-state-metrics" which exposes a lot of metrics about the state of various API objects. See https://github.com/kubernetes/kube-state-metrics/blob/master/docs/deployment-metrics.md for example.
Change 597317 merged by Cwhite:
[operations/docker-images/production-images@master] add loki 1.5.0
Change 602490 had a related patch set uploaded (by Cwhite; owner: Cwhite):
[operations/puppet@production] profile: add loki output support to the logstash pipeline
Change 602729 had a related patch set uploaded (by Cwhite; owner: Cwhite):
[operations/puppet@production] profile: add loki_event filter script
Change 602730 had a related patch set uploaded (by Cwhite; owner: Cwhite):
[integration/config@master] add filter_scripts volume mount to logstash-filter-verifier job
Change 602729 merged by Cwhite:
[operations/puppet@production] profile: add loki_event filter script
Change 605343 had a related patch set uploaded (by Cwhite; owner: Cwhite):
[operations/puppet@production] service::docker: enhance volume support
Change 602730 merged by jenkins-bot:
[integration/config@master] Add filter_scripts volume mount to logstash-filter-verifier job
Change 610119 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/config@master] jjb: volume mount for logstash must be absolute path
Change 610119 merged by jenkins-bot:
[integration/config@master] jjb: volume mount for logstash must be absolute path
Change 616811 had a related patch set uploaded (by Cwhite; owner: Cwhite):
[operations/puppet@production] hiera: specify tlsproxy configuration for grafana
Change 616851 had a related patch set uploaded (by Cwhite; owner: Cwhite):
[operations/puppet@production] provision loki on grafana-next
Change 617250 had a related patch set uploaded (by Cwhite; owner: Cwhite):
[operations/debs/prometheus-es-exporter@debian/sid] debianization
Change 617250 merged by Cwhite:
[operations/debs/prometheus-es-exporter@debian/sid] debianization
Change 719056 had a related patch set uploaded (by Jbond; author: John Bond):
[operations/puppet@production] puppet_agent_stats: add catalog version to prom metricts
In relation to puppet i think we could look again at creating a puppet logstash report. This was never pushed to production do to concerns about sending the full puppet catalogue diff to logstash. however i think we should be able to ensure we only send meta data and not the actual diffs
Change 719056 merged by Jbond:
[operations/puppet@production] puppet_agent_stats: add catalog version to prom metrics
Change 719372 had a related patch set uploaded (by Jbond; author: John Bond):
[operations/puppet@production] P:puppetmaster::common: Add back logstash support
Change 719368 had a related patch set uploaded (by Jbond; author: John Bond):
[operations/puppet@production] puppetmaster: drop log messages from logstash reporter
Change 722580 had a related patch set uploaded (by Jbond; author: John Bond):
[operations/software/ecs@master] git - schema: Add new schema for adding git information
Change 722873 had a related patch set uploaded (by Jbond; author: John Bond):
[operations/software/ecs@master] schemas - metrics: Add puppet keys to the metrics name space
Change 722580 merged by jenkins-bot:
[operations/software/ecs@master] git - schema: Add new schema for adding git information
Change 722873 merged by jenkins-bot:
[operations/software/ecs@master] schemas - metrics: Add puppet keys to the metrics name space
Change 719368 merged by Jbond:
[operations/puppet@production] puppetmaster: drop log messages from logstash reporter
Change 719372 merged by Jbond:
[operations/puppet@production] P:puppetmaster::common: Add back logstash support
Change 734961 had a related patch set uploaded (by Jbond; author: John Bond):
[operations/puppet@production] puppetmaster: enable logstash reports
Change 734961 merged by Jbond:
[operations/puppet@production] puppetmaster: enable logstash reports
Change 736233 had a related patch set uploaded (by Jbond; author: jbond):
[operations/puppet@production] P:rsyslog: ship puppetmaster logs to kafka
Change 736233 merged by Jbond:
[operations/puppet@production] P:rsyslog: ship puppetmaster logs to kafka
Change 804484 had a related patch set uploaded (by Cwhite; author: Cwhite):
[operations/puppet@production] logstash: ship scap.announce channel to loki
Change 806349 had a related patch set uploaded (by Cwhite; author: Cwhite):
[operations/puppet@production] logstash: duplicate alert logs for loki target
Change 806430 had a related patch set uploaded (by Cwhite; author: Cwhite):
[operations/puppet@production] logstash: alertmanager use logsource as source for host.name field
Change 806430 merged by Cwhite:
[operations/puppet@production] logstash: alertmanager use logsource as source for host.name field
Change 809302 had a related patch set uploaded (by Cwhite; author: Cwhite):
[operations/puppet@production] loki: add loki as an optional grafana component
Change 804484 merged by Cwhite:
[operations/puppet@production] logstash: duplicate scap.announce logs for loki target
Change 809302 merged by Cwhite:
[operations/puppet@production] loki: add loki as an optional grafana component
Change 809706 had a related patch set uploaded (by Cwhite; author: Cwhite):
[operations/puppet@production] beta-logs: add minimal grafana config
Change 809706 merged by Cwhite:
[operations/puppet@production] beta-logs: add minimal grafana config
Change 809709 had a related patch set uploaded (by Cwhite; author: Cwhite):
[operations/puppet@production] loki: add ferm rule to control api access
Change 809722 had a related patch set uploaded (by Cwhite; author: Cwhite):
[operations/puppet@production] logstash: add loki output support
Change 809709 merged by Cwhite:
[operations/puppet@production] loki: add ferm service to control api access
Change 809722 merged by Cwhite:
[operations/puppet@production] logstash: add loki output support
Change 810064 had a related patch set uploaded (by Cwhite; author: Cwhite):
[operations/puppet@production] beta-logs: set loki retention to 3d
Change 810064 merged by Cwhite:
[operations/puppet@production] beta-logs: set loki retention to 3d
Change 810110 had a related patch set uploaded (by Cwhite; author: Cwhite):
[labs/tools/stashbot@master] Add support for posting events to eventgate
Change 810115 had a related patch set uploaded (by Cwhite; author: Cwhite):
[schemas/event/secondary@master] Add logging/sal/1.0.0 schema
Change 813715 had a related patch set uploaded (by Cwhite; author: Cwhite):
[operations/puppet@production] profile: make loki data directory configurable
Change 813724 had a related patch set uploaded (by Cwhite; author: Cwhite):
[operations/puppet@production] hiera: deploy and enable loki on grafana hosts
Change 813985 had a related patch set uploaded (by Cwhite; author: Cwhite):
[operations/puppet@production] loki-beta: increase grpc message size
Change 813985 merged by Cwhite:
[operations/puppet@production] loki-beta: increase grpc message size
Change 813715 merged by Cwhite:
[operations/puppet@production] profile: make loki data directory configurable
Change 813724 merged by Cwhite:
[operations/puppet@production] hiera: deploy and enable loki on grafana hosts
Change 814915 had a related patch set uploaded (by Cwhite; author: Cwhite):
[operations/puppet@production] logstash: enable loki public output on production
Change 814915 merged by Cwhite:
[operations/puppet@production] logstash: enable loki public output on production
We've enabled the Public Logs datasource in Grafana and forwarded scap.announce logs to it.
Change 806349 merged by Cwhite:
[operations/puppet@production] logstash: duplicate alert logs for loki target
Change 810110 abandoned by Cwhite:
[labs/tools/stashbot@master] Add support for posting events to eventgate
Reason:
Change 810115 abandoned by Cwhite:
[schemas/event/secondary@master] Add logging/sal/1.0.0 schema
Reason:
Change 602490 abandoned by Cwhite:
[operations/puppet@production] profile: add loki output support to the logstash pipeline
Reason:
in favor of using the loki output plugin
Change 605343 abandoned by Cwhite:
[operations/puppet@production] service::docker: enhance volume support
Reason:
we packaged loki in a deb package instead
Change 616811 abandoned by Cwhite:
[operations/puppet@production] hiera: specify tlsproxy configuration for grafana
Reason:
Change 616851 abandoned by Cwhite:
[operations/puppet@production] provision loki on grafana-next
Reason: