Fri, Oct 13
Yup all done @thcipriani !
Last change merged and deployed, thumbor hostname is now in logstash \o/
I'm investigating this and the issue seems to lie on lvs boxes, where ipvsadm the gelf service logstash-gelf_12201_udp isn't routed anywhere on lvs1003
Wed, Oct 11
Nice find indeed @hashar !
I've re-read the thread and I think I have a proposal to move things forward.
I'm resolving this task as all major use cases have been covered.
The eqiad change was reverted yesterday due to (among the problem above) labservices machines hanging and not being able to successfully talk TLS with syslog servers. I'll be conducting more tests and apply the change in eqiad more gradually.
Tue, Oct 10
syslog-tls is deployed everywhere but esams (coming shortly)
Sat, Oct 7
Fri, Oct 6
IMO we could approach the problem of getting the stats above to Prometheus in at least two ways:
Thanks @ayounsi ! Looks good to me, some things I found:
@chasemp I bet that's a side effect of T166561: Rollout prometheus-node-exporter 0.14 in labs, is it persisting or has been transient during package upgrades?
Thu, Oct 5
Resolving as the work will be completed in T177196 by porting the missing Diamond collectors.
Will do as part of T177196
(apologies about the delay, I completely missed this!)
All done! I've ran the upgrade with cumin, using the command below (see also
I'll take care of this since we'll need some new collectors from node-exporter 0.14 as part of T177196
Wed, Oct 4
Tue, Oct 3
Mon, Oct 2
Fri, Sep 29
Tue, Sep 26
We're now explicitly excluding statsite traffic and swift clients running on the proxy that talk to backend swift:
Mon, Sep 25
Wed, Sep 20
Ganglia is indeed going away
@Jgreen that's awesome news! I think we can finally shut down ganglia for good !
Tue, Sep 19
Mon, Sep 18
@BBlack your patch to add meminfo_numa seems to be working! Anything left to do ?
Update: with latest upstream git of mtail things seem stable so far.
Sep 17 2017
Sep 16 2017
The growth of used inodes since a few hours was pretty steep, I compressed and removed the older otrs versions:
Sep 15 2017
I poked at this some more this week but go nowhere either in beta or production, to recap:
One way would be to generate grafana dashboards' JSON from python and a list of metrics, namely with sth like grafanalib as outlined in T171482: Programmatic generation of grafana dashboards
Sep 14 2017
@Nick could you try again to delete both files? thanks!
Sep 13 2017
Sep 12 2017
I've downloaded the file Literature_II,_Harutyun_Surkhatian.djvu to check for corruption just in case @Nick though it might be valid and just pathological per-page dimensions.