Check metrics and build dashboards for device-analytics
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	hnowlan
	May 8 2023, 10:09 AM

Description

We currently have some metrics for device-analytics - find out whether they are fit for purpose, if we require additional ones and build useful dashboards from the information we have available.

Details

	Subject	Repo	Branch	Lines +/-
	Renaming + Prom metrics	generated-data-platform/aqs/device-analytics	main	+261 -2
	Changing as per agreed naming conv Added prometheus metrics middleware for improved metrics	generated-data-platform/aqs/device-analytics	main	+0 -0

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Stalled	None	T324931 Clean up open RESTBase related tickets
In Progress	None	T262315 <CORE TECHNOLOGY> API Migration & RESTBase Sunset
In Progress	None	T263489 AQS 2.0
Resolved	SGupta-WMF	T288298 AQS 2.0: Device Analytics service
Resolved	Atieno	T335505 Figure out what's outstanding to have device-analytics serving 100% Production data
Resolved	hnowlan	T336158 Check metrics and build dashboards for device-analytics

Event Timeline

hnowlan created this task.May 8 2023, 10:09 AM

It appears that we currently only get the default go application metrics about the binary itself and various internal execution metrics. We will need to annotate our handlers to get actual per-endpoint histograms etc. I'm looking into how to do this with mux.

@hnowlan Noted , Will also look into it .

Things I can see being of use:

counters for requests
histograms for latencies of requests by endpoint (in most cases this will be a single endpoint I guess)
statistics around connections to Cassandra - failed connections, latencies on requests to cassandra

kamila subscribed.May 9 2023, 1:11 PM

Atieno added a parent task: T288298: AQS 2.0: Device Analytics service.May 10 2023, 1:46 PM

JArguello-WMF moved this task from Next Up to In Progress on the AQS2.0 (Sprint 10) board.May 11 2023, 6:34 PM

SGupta-WMF changed the task status from Open to In Progress.May 22 2023, 9:15 AM

SGupta-WMF claimed this task.

Making code changes to device-analytics as per discussion

Change 922075 had a related patch set uploaded (by Sg912; author: Sg912):

[generated-data-platform/aqs/device-analytics@main] Changing as per agreed naming conv Added prometheus metrics middleware for improved metrics

https://gerrit.wikimedia.org/r/922075

gerritbot added a project: Patch-For-Review.May 22 2023, 10:46 AM

SGupta-WMF reassigned this task from SGupta-WMF to BPirkle.May 22 2023, 10:47 AM

SGupta-WMF moved this task from In Progress to Ready for Code Review/ Ready for Tech input on the AQS2.0 (Sprint 10) board.

Change 922373 had a related patch set uploaded (by Sg912; author: Sg912):

[generated-data-platform/aqs/device-analytics@main] Renaming + Prom metrics

https://gerrit.wikimedia.org/r/922373

Change 922075 abandoned by Sg912:

[generated-data-platform/aqs/device-analytics@main] Changing as per agreed naming conv Added prometheus metrics middleware for improved metrics

Reason:

https://gerrit.wikimedia.org/r/922075

Change 922373 merged by BPirkle:

[generated-data-platform/aqs/device-analytics@main] Renaming + Prom metrics

https://gerrit.wikimedia.org/r/922373

I merged patchset 922373. If we decide we want/need more metrics info, we can add it in a separate change.

QA: tests should pass and /admin/metrics should return reasonable data.

Maintenance_bot removed a project: Patch-For-Review.May 23 2023, 7:30 PM

BPirkle reassigned this task from BPirkle to EChukwukere-WMF.May 26 2023, 2:54 AM

BPirkle moved this task from Ready for Code Review/ Ready for Tech input to Ready for Testing on the AQS2.0 (Sprint 10) board.

BPirkle subscribed.

Metrics verified .
Results :-

Metrics available on new updated path /metrics (See ticket T337428)
Can see request logs as per different status codes - 200 , 400 , 404 and 500
Bracketed as per time intervals

Status : QA pass for metrics , pending dashboards

SGupta-WMF reassigned this task from EChukwukere-WMF to BPirkle.Jun 5 2023, 5:22 AM

SGupta-WMF added a subscriber: EChukwukere-WMF.

@BPirkle Assigning this task to you for dashboard creation.

SGupta-WMF moved this task from Ready for Testing to In Progress on the AQS2.0 (Sprint 10) board.Jun 5 2023, 9:00 AM

Dashboard created here:
https://grafana-rw.wikimedia.org/d/UWuaaNl4k/device-analytics-aqs-2-0?orgId=1

Relevant notes document here:
https://docs.google.com/document/d/1UmVbdrDVLQclNrfCzuGmI-bXYAsWv6bqSjgUqWISKbQ/edit#

This new device-analytics dashboard is a copy of the image-suggestion dashboard:
https://grafana-rw.wikimedia.org/d/SUZQ6rWVz/image-suggestion?orgId=1

Longer term, we may want to transform the new device-analytics dashboard into a full AQS 2.0 dashboard, with a dropdown for selecting which of the services to see details for. We didn't do that yet, because we only have one deployed AQS 2.0 service at this time.

I'm going to call this particular task ready for testing. We can do further enhancements under a separate task.

BPirkle reassigned this task from BPirkle to EChukwukere-WMF.Jun 5 2023, 5:06 PM

BPirkle triaged this task as Medium priority.

BPirkle moved this task from In Progress to Ready for Testing on the AQS2.0 (Sprint 10) board.

JArguello-WMF reassigned this task from EChukwukere-WMF to hnowlan.Jun 7 2023, 2:09 PM

JArguello-WMF moved this task from Ready for Testing to Sign off on the AQS2.0 (Sprint 10) board.

@hnowlan Can you help us with a sign-off, please?
Cc.: @FJoseph-WMF @VirginiaPoundstone @SGupta-WMF @BPirkle

Looks good to me for the purposes of this ticket, thank you!

JArguello-WMF moved this task from Sign off to Done on the AQS2.0 (Sprint 10) board.Jun 20 2023, 2:22 PM

JArguello-WMF closed this task as Resolved.Jun 30 2023, 11:02 PM

Check metrics and build dashboards for device-analyticsClosed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Check metrics and build dashboards for device-analytics
Closed, ResolvedPublic
Actions

Related Objects
Search...