Fri, Dec 14
Thu, Dec 13
Tue, Dec 11
When you're ready to make firewall changes, I'm happy to help :)
It's serving again: https://grafana-old.wikimedia.org/dashboard/db/fundraising-database?orgId=1
Looks like this worked:
New config now puppetized; grafana has been restarted; the JSON consoles look good
I believe I've fixed this:
Ah, I think I know what happened. Hopefully will be an easy fix...
Mon, Dec 10
Need to create some other tasks to track work that should be done with new 5.x features but marking this as done :)
One part of the bug is fixed. The other (typing in the tag filter dropdown box) is not.
This bug may have been fixed in 5.4.1. Going to grab that version into wikimedia-stretch.
Cool, looks like mtail is happy now as well:
I've heard of no issues with Grafana 5, and will be upgrading today.
Sat, Dec 8
Fri, Dec 7
Thu, Dec 6
Okay, grafana1001.eqiad is up and running, with the database (and the plugins tree) copied over from krypton. At first glance it seems to work.
Wed, Dec 5
Haven't had too much time to look at this. FWIW I do see temperature sensors exported by the kernel on cp3007:
Tue, Dec 4
The first Puppet run with the new configuration on grafana1001 failed. I invoked puppet again and it immediately succeeded.
Mon, Dec 3
Please give this a good look! I've never done this before...
Fri, Nov 30
@fgiunchedi does the above sound good to you?
Talked some with volans. Seems like the best thing to do is probably to make a VM in Ganeti running stretch, set it up with a new puppet role just for grafana, copy the DB over, verify that it works well, and then switch over grafana.wikimedia.org to point there.
My plan from here had been to try naively copying grafana's database from krypton to my VM.
Fixed above by manually adding a pile of hiera in Horizon.
However, I do think we should think about a multi-DC-capable solution in this case. We don't want DC switches, outages and other concerns influence, compromise or complicate the security aspects of this mechanism.
Thu, Nov 29
Things I have learned today:
but I think we should aim for having the data in memory as much as possible to achieve good performances, no matter the choice of the data storage
22GB of data in memory?
Wed, Nov 28
So I thought it would be simple to create a VM, use Horizon to enable the Puppet role used for grafana on it (role::webserver_misc_apps), and then install the updated deb manually (since messing with reprepro for seems both unnecessary and scary).
Tue, Nov 27
cdanis-test-grafana5-stretch1.monitoring.eqiad.wmflabs is alive!
Late last week I figured out scraping aggregated data from Prometheus as a CSV and fed that into Plotly:
Mon, Nov 26
Mon, Nov 19
Added some basic graphs to the prometheus-cluster-breakdown console: percentiles of server temperature for a given cluster, and plots for each server of a single, arbitrarily-selected temperature graph (sensor="temp1",chip="platform_coretemp_0")
Nov 12 2018
Many of these machines are always running hot -- ambient temps of 85C or more, even when only lightly loaded.
We observed overheating symptoms on the following machines today:
Nov 9 2018
Nov 7 2018
Need one more signature on my GPG key before pwstore access can be granted