Hi there! I saw @Ladsgroup's email to the ops list (thanks for bringing that to our attention!), so I'll respond to some of the questions he raised there -- sorry if it sounds a bit incoherent with regards to the context above :)
I don't have a strong preference for either. I think the post-processing approach makes sense overall and without looking at it very closely, it seems to me like Electron (and headless Chrome) would be better bets compared to wkhtmltopdf with regards to maintainability, compatibility, security etc.
Thu, Jul 20
I think we can safely decline this ahead of time by 2½ months :)
If you've backported it already, yeah, we can go forward I'd say :) We can leave trusty behind too, I don't see this as a big deal at all.
gdash has been retired since ~February 2016, having been replaced with Grafana.
gdash was retired a year and a half ago, so…
So @godog mentioned today that we can't actually recover the Torrus data from Bacula, as these were lost forever :(
We run 4.0 on stretch systems nowadays. Would it be worthwhile to backport it to jessie and trusty? Anything that we're missing from 3.5?
What's needed to be done here, from whom and with what priority? (Asking because it shows up in our monitoring workboard)
Wed, Jul 19
Fri, Jul 14
Thu, Jul 13
Given that we're phasing out Ganglia, is that task moot now?
The updated list of devices missing model/number can be found below.
Wed, Jul 12
@ema, it seems like the task as described has been completed (awesome work and great presentation btw!). Is there anything left to be done or shall we resolve this task?
All listed here and most of the T169360's are fixed now. What isn't fixed is due to hardware troubles that is tracked separately (and it's just 5 now, instead of ~2% :). Resolving!
So it seems like the remaining ones are:
Tue, Jul 11
Chris fixed the cables for conf1003, kafka1018, kafka1020 and db1063. All fixed!
Looks like it expires in September:
Validity Not Before: Jul 18 18:16:03 2016 GMT Not After : Sep 4 12:10:02 2017 GMT Subject: C = US, ST = California, L = San Francisco, O = "Wikimedia Foundation, Inc.", CN = eventdonations.wikimedia.org
I racreset all of the ones in list which had a discrepancy of their IP configuration with the output (showing 192.168.0.1 as gateway) and they're all fixed now.
Mon, Jul 10
So I did the following:
- mw1302: had Volatile_Channel_Privilege_Limit and Non_Volatile_Channel_Privilege_Limit set to Operator instead of Administrator; fixed with bmc-config
- stat1003: had wrong DNS, fixed that
- a bunch of the rest had the issue that I described in T160392 (IPMI password had gotten out of sync with iDRAC password); fixed with sshpass -e ssh root@$hostname racadm config -g cfgUserAdmin -o cfgUserAdminPassword -i 2 $password
Same issue as T160392. From the iDRAC web interface, I set the password to something random then back to our password and this seems to have done the trick.
OK, so I noticed that the Error: Unable to establish IPMI v2 / RMCP+ session response was immediate, like the password was wrong. So I tried changing the password to something else from the iDRAC web interface, and then changing it back to our regular one, and this seems to have done the trick for both db1070 and db1071.
FYI, db1071 is in a similar state, I'm not sure why.
ms-be2010 is decom'ed now, resolving.
Long resolved, geoiplookup doesn't exist anymore (T100902).
What's left to be done here, @Dzahn?
So the IPMI checks have been deployed for a while. Quite a few hosts had BMC issues (some of them are fixed), and it remains to be seen whether the IPMI checks are going to be reliable enough for our uses.