Page MenuHomePhabricator

Improve visibility into blazegraph queries causing errors
Open, MediumPublic

Description

(Note: This came out of the Sep 2 2020 wdqs outage)

Deferring to ticket assignee on best way to achieve the objective, but here are some ideas:

  • Create (or improve existing) Kibana dashboard that shows blazegraph error messages and extracts top user agents, etc
  • Document a "backup" process for if Kibana is not performant enough, basically example commands such as grep req.xForwardedFor /var/log/wdqs/wdqs-blazegraph.log | grep 500 | cut -d= -f3 | sort | uniq -c | sort -nr
  • Grafana dashboard that shows total count of blazegraph errors? (this might not be useful)

Event Timeline

Gehel triaged this task as High priority.Sep 8 2020, 7:10 PM
Gehel lowered the priority of this task from High to Medium.Aug 19 2021, 2:52 PM

@RKemper I presume that there will be no more investment into Blazegraph monitoring so it should be OK to close this task.