While browsing Kartotherian metrics, I see a few metrics which I don't understand, or which seems to be reporting too broad aggregates. For example, the kartotherian.req.* metrics seem to report an aggregate of all requests. This should probably be split by cluster (eqiad / codfw / maps-test) to make more sense. Worse, kartotherian.heap.* seems to also be aggregated, where heap mostly make sense when viewed for a single instance. It also seems that some metrics are not sent using the correct type. We collect percentiles for heap, which does not seem to make sense. Heap should be a metric of type "gauge" and should not collect percentiles.
Some investigation is needed to understand how those metrics are published, which one make sense and which one don't. We need to document what we want to achieve with those metrics and check that implementation is done accordingly.