Page MenuHomePhabricator

Develop availability metrics for PAWS
Closed, DuplicatePublicFeature

Description

Measure how often PAWS stays up / goes down.

Related Objects

StatusSubtypeAssignedTask
Resolvedrook
ResolvedNone
DuplicateFeatureNone

Event Timeline

@Harej What exactly do you want to measure? The availability of the https://paws.wmflabs.org/paws/hub/login entry page, or something more complex like the ability of PAWS to spawn a new container as a particular Wikimedia user account?

the ability of PAWS to spawn a new container as a particular Wikimedia user account?

More along those lines. Going to paws.wmflabs.org by itself seems to work consistently, but the Start My Server button works inconsistently and I'd like to begin tracking when it works and when it doesn't.

Chicocvenancio subscribed.

Jupyterhub 0.9 will come with prometheus metrics. See https://paws-beta.wmflabs.org/paws/hub/metrics

More along those lines. Going to paws.wmflabs.org by itself seems to work consistently, but the Start My Server button works inconsistently and I'd like to begin tracking when it works and when it doesn't.

server_spawn_duration_seconds_count{status="failure"} might take care of that.

Change 441514 had a related patch set uploaded (by Chico Venancio; owner: Chico Venancio):
[operations/puppet@production] prometheus: tools: scrape paws metrics into prometheus

https://gerrit.wikimedia.org/r/441514

CommunityTechBot renamed this task from tpcaaaaaaa to Develop availability metrics for PAWS.Jul 1 2018, 6:52 PM
CommunityTechBot raised the priority of this task from High to Needs Triage.
CommunityTechBot updated the task description. (Show Details)
CommunityTechBot added a subscriber: gerritbot.

Change 441514 merged by Andrew Bogott:
[operations/puppet@production] prometheus: tools: scrape paws metrics into prometheus

https://gerrit.wikimedia.org/r/441514

@Andrew Thanks for the merge. I can't verify the resulting file due to permissions, but since puppet has run in tools-prometheus-01 and the log indicates /srv/prometheus/tools/prometheus.yml should now have the new paws-hub target should be set up correctly. https://tools-prometheus.wmflabs.org/tools/targets still does not have it. Perhaps a prometheus service reload/restart is needed?

Chicocvenancio changed the subtype of this task from "Task" to "Feature Request".

Since the metrics are available for most aspects of this in k8s now, I'm just merging this over to the enable metrics task for the new cluster.