Page MenuHomePhabricator

Add Query Response Time to WDQS reporting
Closed, ResolvedPublic

Description

Query Response Time is a key success indicator for our backend migration away from Blazegraph. It offers insight into average performance for successful queries and will be used for ongoing traffic monitoring and reporting.

T414743: Monitoring for top WDQS user agents has a dependency on this metric.

AC:

  • Metric is added to added to existing Grafana dashboard
  • Metric is available at aggregated level, wdqs node level, and user agent level (not Grafana-specific)

Event Timeline

For Grafana: the wdqs prometheus metrics exporter does not currently ship latency metrics. We can work with metrics from the traffic serves. but we don't have visibility on single wdqs hosts. Metrics are also reported in duration buckets, so we'd report on aggregates. This is IMHO still useful info because it gives us visibility wrt outliers. I put together a sample panel that reports p50 (median), p95 and p99 for the wdqs fleet in eqiad and codfw: https://grafana.wikimedia.org/goto/mr3zGDvvR?orgId=1

Finer grained info, with about two hours of delay from realtime, is available from query logs. https://superset.wikimedia.org/sqllab/?savedQueryId=1260 is  breakdown by year, month, graph_name, backend_host, ua, query_latency for the external endpoint, that we can incorporate in superset reporting.

As mentioned on T414743, I'm unable to load the superset dashboard and am unsure if this is an issue with the performance of some of all of the queries. We can resolve this task once root causing that issue.

As mentioned on T414743, I'm unable to load the superset dashboard and am unsure if this is an issue with the performance of some of all of the queries. We can resolve this task once root causing that issue.

F/up from internal discussion. We did root cause the issue and have a solution proposed in T418723: Materialize analytics queries to improve superset dashboard latency

Closing, as we are tracking follow on work for implementation in T418723.