(Note: This came out of the Sep 2 2020 wdqs outage)
Deferring to ticket assignee on best way to achieve the objective, but here are some ideas:
- Create (or improve existing) Kibana dashboard that shows blazegraph error messages and extracts top user agents, etc
- Document a "backup" process for if Kibana is not performant enough, basically example commands such as grep req.xForwardedFor /var/log/wdqs/wdqs-blazegraph.log | grep 500 | cut -d= -f3 | sort | uniq -c | sort -nr
- Grafana dashboard that shows total count of blazegraph errors? (this might not be useful)