As a WDQS user, I want Blazegraph to be stable, instead of being rendered unresponsive due to breaking queries (that require restarts).
Some queries can break Blazegraph and render it unresponsive, requiring restarts. Because we are currently only logging queries that are successfully finished by Blazegraph, we are unable to immediately know which queries actually caused it to lock up.
It would be useful to have a list of queries that can break Blazegraph to see if there is anything we can learn/improve. For example: if al breaking queries come from one useragent, we could ban them; or if there is a small finite set of recurring queries that break Blazegraph, we might somehow block those specific queries.
- Have a (partial) list of queries that cause Blazegraph to lock up
Out of scope
- This ticket will not involve building out any instrumentation. We will look only at recent logs (including sept 3 codfw outage)