Page MenuHomePhabricator

Metrics from WikiFactMine labs project have disappeared from graphite
Closed, InvalidPublic

Description

There were metrics I'd pushed to labmon1001.eqiad.wmnet using

echo "wikifactmine.metrics.fact_count $(cat) $(date -u +%s)" | nc -q0 labmon1001.eqiad.wmnet 2003 once a day.

i.e. a count at single point in time

Until recently (in the last month) there was between one and two months of data and could be seen in grafana etc. these don't seem to be there any more.

I don't think this should have gone from the aggregation of results to reduce the granularity as they get older but perhaps I am mistaken.

Event Timeline

I agree that this is super odd and these things probably shouldn't just disappear.

What is odd is that many other metrics in graphite remain untouched, it just seems to be this one.

Perhaps its because it is under a path that is autofilled with metrics describing the status of labs instances.

I think perhaps other metrics may have also gone. Almost all the ones I can see there are metrics of other projects individual VM instances. I think when I set this up there were more custom metrics being supplied by people but I might be imagining this.

The first step there

Gets list of hosts that have any metric defined

Simply looks at the metrics defined for each project assuming they are all hosts.
Then a check is done to see if the host exists.
No host with the name "metrics" exists and thus the metirc is "archived"

It looks like you will have to put the metric somewhere else, where the first part of the metric name is NOT the name of any project, and will not ever be used by any project.
This should probably be documented somewhere.

Yep, that's exactly what's happening.

I can find all my old metrics in archived_metrics

This should probably be documented somewhere.

This should probably be documented somewhere.

This should probably be documented somewhere.

Great minds think alike.

I've popped a note in https://wikitech.wikimedia.org/wiki/Graphite but it could be good to have a specific guide for labs users.

It might be an idea to move all of these instance metrics 1 level deeper to avoid things like this happening.
There are a bunch of other top level metrics with names that might one day clash with projects and then poof, many metrics would be archived.

@bd808 @madhuvishy thoughts?

fnegri subscribed.

The WikiFactMine project has been deleted (see T236580) and the related toolforge tools have been disabled (https://toolsadmin.wikimedia.org/tools/id/wikifactmine-api and https://toolsadmin.wikimedia.org/tools/id/wikifactmine-pipeline).