Page MenuHomePhabricator

Improve visibility into blazegraph queries causing errors
Open, MediumPublic

Description

(Note: This came out of the Sep 2 2020 wdqs outage)

Deferring to ticket assignee on best way to achieve the objective, but here are some ideas:

  • Create (or improve existing) Kibana dashboard that shows blazegraph error messages and extracts top user agents, etc
  • Document a "backup" process for if Kibana is not performant enough, basically example commands such as grep req.xForwardedFor /var/log/wdqs/wdqs-blazegraph.log | grep 500 | cut -d= -f3 | sort | uniq -c | sort -nr
  • Grafana dashboard that shows total count of blazegraph errors? (this might not be useful)