Now that fundraising is no longer using ganglia we can uninstall it from the fleet (gmetad / gmond / etc) and remove the relevant puppet bits.
Description
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | fgiunchedi | T177195 Reduce technical debt in metrics monitoring | |||
Resolved | Dzahn | T177225 Uninstall ganglia from the fleet | |||
Resolved | Dzahn | T183873 Update ganglia mentions in prominent documentation | |||
Resolved | Andrew | T183917 remove cloud VPS project 'ganglia' |
Event Timeline
Change 396086 merged by Dzahn:
[operations/puppet@production] logging/kafkatee: remove ganglia monitoring
Change 396088 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet/kafkatee@master] kafkatee: remove Ganglia monitoring class and script
Change 396101 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] site: add ores200[19] as spare systems
Change 396101 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] site: add ores200[19] as spare systems
Change 396101 merged by Dzahn:
[operations/puppet@production] site: add ores200[19] as spare systems
Change 396104 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] site: add eventlog2001 as spare::system
Change 396104 merged by Dzahn:
[operations/puppet@production] site: add eventlog2001 as spare::system
Change 396106 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] bastionhost, mw_rc_irc,backup::offsite,pybaltest: rm ganglia
Change 396106 merged by Dzahn:
[operations/puppet@production] bastionhost, mw_rc_irc,backup::offsite,pybaltest: rm ganglia
Change 396129 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] lvs::balancer: remove ganglia
Change 396129 merged by Dzahn:
[operations/puppet@production] lvs::balancer: remove ganglia
Change 396290 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] db2011: remove ganglia
Change 396291 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] mariadb::labs_deprecated: remove ganglia
Change 396291 merged by Dzahn:
[operations/puppet@production] mariadb::labs_deprecated: remove ganglia
Change 396292 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] labsdb: remove ganglia
Change 396294 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] labsdb::slave: keep ganglia because postgresql
Change 396294 merged by Dzahn:
[operations/puppet@production] labsdb::slave: keep ganglia because postgresql
Change 396088 merged by Dzahn:
[operations/puppet/kafkatee@master] kafkatee: remove Ganglia monitoring class and script
Change 397986 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] mail::mx: remove ganglia
Change 397986 merged by Dzahn:
[operations/puppet@production] mail::mx: remove ganglia
Change 397990 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] mariadb::parsercache: remove ganglia
Change 382916 merged by Dzahn:
[operations/puppet@production] exim4/ganglia: mx,otrs,lists,phab: rm Ganglia exim stats
Change 397990 merged by Dzahn:
[operations/puppet@production] mariadb::parsercache: remove ganglia
Change 398186 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] parsercache: remove ganglia from parsercache nodes
Change 398186 merged by Dzahn:
[operations/puppet@production] parsercache: remove ganglia from parsercache nodes
Change 398390 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] dbstore: remove ganglia
Change 398390 merged by Dzahn:
[operations/puppet@production] dbstore: remove ganglia
Change 398398 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] external storage codfw: remove ganglia
Change 398398 merged by Dzahn:
[operations/puppet@production] external storage codfw: remove ganglia
Change 398413 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] mysql codfw: remove ganglia
Change 398413 merged by Dzahn:
[operations/puppet@production] mysql codfw: remove ganglia
Mentioned in SAL (#wikimedia-operations) [2017-12-15T03:01:15Z] <mutante> db2016 thru db2019 - had to manually kill gmond process to decom ganglia, other db codfw hosts: didnt need it | running puppet on db205* and others in codfw to remove all ganglia (T177225)
Change 398526 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] external storage eqiad: remove ganglia
Change 398526 merged by Dzahn:
[operations/puppet@production] external storage eqiad: remove ganglia
Change 398528 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] dbproxy eqiad: remove ganglia
Change 398528 merged by Dzahn:
[operations/puppet@production] dbproxy eqiad: remove ganglia
Change 398531 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] db eqiad: remove ganglia
Change 398531 merged by Dzahn:
[operations/puppet@production] db eqiad: remove ganglia
Alright, Ganglia is purged from everything across the board, except 17 hosts now! :) They are:
4 x maps codfw (osm/postgres)
4 x maps eqiad (osm/postgres)
3 x maps-test codfw (osm/postgres)
3 x labsdb eqiad (postgres)
2 x install (aggregators eqiad/codfw)
1 x ganglia-web (uranium)
All else DONE
We are not actively using ganglia for maps, so we can remove those without any issue.
Change 398899 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] maps: remove ganglia
Cool! Thanks for confirming. I had just kept it because there was no replacemente for postgres stats yet but that's also work in progress and coming up. Removed from maps and maps-test cluster just now :)
Change 394518 abandoned by Dzahn:
mysql eqiad: remove ganglia
Reason:
not neeed anymore, done in multiple other changes
Change 398903 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] labsdb100[467]: remove ganglia
Change 398903 merged by Dzahn:
[operations/puppet@production] labsdb100[467]: remove ganglia
Change 382905 merged by Dzahn:
[operations/puppet@production] osm: remove all ganglia support
Change 382906 merged by Dzahn:
[operations/puppet@production] postgresql: remove all ganglia support
Change 398904 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] osm/postgres: remove ganglia diskstat plugin inclusion
Change 398904 merged by Dzahn:
[operations/puppet@production] osm/postgres: remove ganglia diskstat plugin inclusion
Change 382904 merged by Dzahn:
[operations/puppet@production] ganglia/site: decom ganglia-web node, rm eqiad/codfw aggregators
Mentioned in SAL (#wikimedia-operations) [2017-12-18T20:53:35Z] <mutante> ganglia.wikimedia.org shut down just now after a deprecation period - service is out of commission - T177225
Change 382923 merged by Dzahn:
[operations/puppet@production] statsd: remove ganglia backend support
Change 382924 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] standard: decom ganglia plugin everywhere by default
Change 382924 merged by Dzahn:
[operations/puppet@production] standard: decom ganglia plugin everywhere by default
Change 382926 merged by Dzahn:
[operations/puppet@production] standard: actually drop 'has_ganglia' param entirely
Change 382932 merged by Dzahn:
[operations/puppet@production] ganglia: delete ganglia-web classes and role
Change 399119 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] network::constants: drop uranium from monitoring hosts
Change 399120 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] remove ganglia_aggregators settings from hiera
Change 399121 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] rm role/manifests/ganglia/config
Change 399121 merged by Dzahn:
[operations/puppet@production] rm role/manifests/ganglia/config
Change 399124 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/dns@master] remove ganglia.wikimedia.org
Change 399119 merged by Alexandros Kosiaris:
[operations/puppet@production] network::constants: drop uranium from monitoring hosts
Change 399120 merged by Dzahn:
[operations/puppet@production] remove ganglia_aggregators settings from hiera
Mentioned in SAL (#wikimedia-operations) [2017-12-19T19:49:23Z] <mutante> deleted ganglia.wikimedia.org from DNS - webserver was already down since yesterday - not used anymore (T177225)
Change 399248 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] redis: delete ganglia monitoring script
Change 382933 merged by Dzahn:
[operations/puppet@production] ganglia: delete the module
Change 399326 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] redis: delete ganglia monitoring script
Ganglia has been uninstalled from the fleet, the aggregators are gone, the roles and the module is deleted, the DNS name is removed, for all purposes it's gone. The remaining hits for grepping "ganglia" across the repo are mostly related to the "ganglia_clusters" variable in Hiera which we should replace with LVS config or rename:
https://gerrit.wikimedia.org/r/#/c/382930/
(WIP) https://gerrit.wikimedia.org/r/#/c/382931/
And then it appears a couple times in modules/confluent/manifests/kafka/mirror/jmxtrans.pp and in an example in wmflib.
But none of this means Ganglia is still running at any capacity, so this ticket is resolved.
Change 399326 merged by Dzahn:
[operations/puppet@production] redis: delete ganglia monitoring script
Change 399686 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] confluent:kafka:jmxtrans: remove Ganglia support
Change 399686 abandoned by Dzahn:
confluent:kafka:jmxtrans: remove Ganglia support
Reason:
depends on a submodule
Change 399689 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet/varnishkafka@master] Parameterize kafka.ssl.cipher.suites
Change 399691 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] apache: remove comments about Ganglia monitoring
Change 399691 merged by Dzahn:
[operations/puppet@production] apache: remove comments about Ganglia monitoring
Change 399699 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet/jmxtrans@master] drop optional Ganglia params from metrics::jvm
Change 399689 merged by Ottomata:
[operations/puppet/varnishkafka@master] Parameterize kafka.ssl.cipher.suites
Change 399686 merged by Dzahn:
[operations/puppet@production] confluent:kafka:jmxtrans: remove Ganglia support
Change 399699 abandoned by Dzahn:
drop optional Ganglia params from metrics::jvm
Reason:
not needed for the other change to work
Change 399248 abandoned by Dzahn:
redis: delete ganglia monitoring script
Reason:
already done
Change 406794 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] hiera/wmflib/pybal: rename ganglia_clusters to wikimedia_clusters
Change 406794 merged by Dzahn:
[operations/puppet@production] hiera/wmflib/pybal: rename ganglia_clusters to wikimedia_clusters
Change 382931 abandoned by Dzahn:
hiera/wmflib: drop ganglia_clusters variable entirely?
Reason:
superseded by https://gerrit.wikimedia.org/r/#/c/406794/
Change 382930 abandoned by Dzahn:
pybal: use lvs::config not ganglia_clusters to determine if appserver
Reason:
superseded by https://gerrit.wikimedia.org/r/#/c/406794/
Change 409384 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] wmflib/prometheus: get_clusters, update Ganglia related comments
Change 409384 merged by Dzahn:
[operations/puppet@production] wmflib/prometheus: get_clusters, update Ganglia related comments