diamond 4.0 has been released in December, we're running 3.5 though
Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | fgiunchedi | T97635 Update diamond to latest upstream version | |||
Resolved | fgiunchedi | T171580 Diamond log level set to DEBUG spams syslog | |||
Resolved | • chasemp | T171583 Diamond collectors collects NFS statistics on Cloud-VPS |
Event Timeline
We run 4.0 on stretch systems nowadays. Would it be worthwhile to backport it to jessie and trusty? Anything that we're missing from 3.5?
It is a lot of development history between the two releases (https://github.com/python-diamond/Diamond/compare/v3.5...v4.0.515) and I'd say some updated/improved collectors essentially, nothing we're badly missing AFAICT.
I tried a quick backport and on jessie it works as is, on trusty it doesn't (debhelper >= 10, but even when lowering that to >= 9 still FTBFS so more fiddling is needed there). If we really wanted we could go with the jessie backport and leave trusty behind. When trusty is out of the door we can deprecate our diamond package altogether and use Debian's
If you've backported it already, yeah, we can go forward I'd say :) We can leave trusty behind too, I don't see this as a big deal at all.
Mentioned in SAL (#wikimedia-operations) [2017-07-20T14:23:37Z] <godog> upload diamond 4.0.515-4~bpo8+1 to jessie-wikimedia - T97635
Mentioned in SAL (#wikimedia-operations) [2017-07-20T14:41:20Z] <godog> upload diamond 4.0.515-4~bpo8+2 to jessie-wikimedia - T97635
I tried on cp1008 and a couple of thumbor machines and diamond seems to work just fine, package is uploaded and pending rollout to jessie machines
Mentioned in SAL (#wikimedia-operations) [2017-07-25T09:14:19Z] <godog> upgrade diamond to 4.0.515 in ulsfo and esams - T97635
Mentioned in SAL (#wikimedia-operations) [2017-07-25T09:20:57Z] <godog> upgrade diamond to 4.0.515 in codfw - T97635
Mentioned in SAL (#wikimedia-operations) [2017-07-25T12:09:05Z] <godog> upgrade diamond to 4.0.515 in eqiad - T97635
It looks like diamond still takes a long time stop, this was reported by @faidon at https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=854842 and https://github.com/python-diamond/Diamond/issues/595 though it doesn't seem to be fixed (~40s on jessie, ~20s on stretch)
root@lithium:/srv/syslog# time systemctl stop diamond real 0m40.043s user 0m0.000s sys 0m0.000s root@ms-be2020:~# time systemctl stop diamond real 0m20.070s user 0m0.000s sys 0m0.000s
Applying https://github.com/Ssawa/Diamond/commit/8b58d7a7dd2a1249731b0642b35e7d7cbdcf611f from the github issue fixes it and stop is fast again. The patch isn't applied upstream yet though
# time systemctl start diamond && sleep 30 && time systemctl stop diamond real 0m0.007s user 0m0.000s sys 0m0.000s real 0m0.037s user 0m0.000s sys 0m0.000s
The --log-stdout issue has been filed as https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=869970
As for the slow shutdown I've reopened https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=854842
Both issues (debug log and slow stop) have been bandaided in our puppet in the meantime
Fixed for our purposes, we can follow-up on upstream's/Debian's bug reports for the long-term fixes.