Following up from https://phabricator.wikimedia.org/T372457#10408775
In Netops we make little use of the LibreNMS stats exported to Graphite, so it's fine if they go.
WMCS do use them, however, so we need to check on that side. I think everything that we need should be there, and if not we can add those path's to the gnmic collection. My only worry is that right now the gnmic stats have some problems, namely that we observe gaps in the graphs like this from time to time:
I'm not 100% sure what the issue is here. I'm fairly certain it is not related to the way those graphs are set up, or anything like rollovers in counter max values etc. But I've not had time to dig into the issue fully. When we first rolled out the gnmic stats we had similar gaps, but much bigger and more frequently. Increasing scraper timeouts and worker threads solved it for the most part, but we still see it sometimes. That makes me suspect the issue is still some sort of occasional performance bottleneck. The netflow VMs don't seem to be overly taxed, however (CPU hits max on them scraping at certain points, but it's not constant so there should be cycles for it to do whatever it needs).
Thank you for the explanation, that makes sense to me. What is the dashboard and the underlying expression in the graph above?
We also see another type of discrepancy, which I'm not so sure about (perhaps is related to counter rollovers?). Here we do appear to have measurements but the counter goes to zero, even though it's pretty much impossible that was actually the case:
{F58025300 width=600}
The image doesn't show up for me, however what's the dashboard and expression I can take a look at ?
