Page MenuHomePhabricator

create graphana dashboards for the Wikidata Bridge
Closed, ResolvedPublic

Description

In T231204 and T249120 we introduced tracking for the Wikidata Bridge to help us make decisions for future development. We have dashboards on grafana-labs prepared for it (linked in previous tickets) to track non-production data. We now want to also have them for the production data now that the Wikidata Bridge is deployed.

Some things could be taken from beta dashboard: https://grafana-labs.wikimedia.org/d/000000020/wikidata-and-base-on-labs

Acceptance criteria:

Event Timeline

We should probably look at the "beta" dashboard and carry everything over from there that is expected.
https://grafana-labs.wikimedia.org/d/000000020/wikidata-and-base-on-labs

Also, we should double-check the existing panel as there should be some data displayed by now, even if it comes just from us playing around with it on cawiki.

I think I’ve fixed the Data Types dashboard on the existing panel now, other panels TBD later.

I think I’ve fixed the Data Types dashboard on the existing panel now, other panels TBD later.

Thank you! Though looking at it, I feel we may want to tweak a bit how it is displayed. Maybe increase the intervals/bins to per day?

Yeah I think per day makes more sense.

Well, then we get this…

Screenshot_2020-08-19 Wikidata Bridge - Grafana.png (334×910 px, 22 KB)

Doesn’t look very useful to me. I tried to add a table with total counts for the selected time range, but when I went to save it Grafana told me someone else had already edited the board.

image.png (359×797 px, 28 KB)

I already adjusted it and I thought/intended to set the default dashboard time frame to "the last 7 days" as that would seem more useful anyway. Not sure why it shows you the last hours again?

Well, because I had/have &from=now-6h&to=now in the URL ^^

Added all tasks to https://grafana.wikimedia.org/d/pVG7xcAZz/wikidata-bridge.

However, the two performance metrics to the left look very strange:

  • The 99th percentile of the link-listener attach time looks way too high?
  • The percentiles for the click delay shouldn't all be always the same? -> probably just not enough data. These percentiles are aggregated per minute.

The link-listener attach time seems high indeed, but I think the log scale might be a bit confusing? On a linear scale, it looks clearer to me that the really high times are only short spikes:

Screenshot_2020-08-20 Wikidata Bridge - Grafana-log.png (296×910 px, 80 KB)

Screenshot_2020-08-20 Wikidata Bridge - Grafana.png (296×910 px, 37 KB)

Mh, true. I can't reproduce it with closing/restoring a tab or session. Guess we just have to accept it as a mystery for now. Moving to verification for @Lydia_Pintscher.

Change 621718 had a related patch set uploaded (by Michael Große; owner: Michael Große):
[mediawiki/extensions/Wikibase@master] bridge: Don't record performance in background tabs

https://gerrit.wikimedia.org/r/621718

Change 621718 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] bridge: Don't record performance in background tabs

https://gerrit.wikimedia.org/r/621718