Page MenuHomePhabricator

stat1004 doesn't show up in ganglia
Closed, ResolvedPublic

Description

stat1002 is visible in Ganglia as part of "Analytics cluster eqiad" - stat1004 is not there.

Event Timeline

Hi @JAllemandou. Please associate at least one project with this task to allow others to find this task when searching in the corresponding project(s). Thanks!

Dzahn triaged this task as Medium priority.

Change 302283 had a related patch set uploaded (by Dzahn):
statistics: set cluster for stat1004

https://gerrit.wikimedia.org/r/302283

Change 302283 merged by Dzahn:
statistics: set cluster for stat1004

https://gerrit.wikimedia.org/r/302283

investigated a bit. could confirm outgoing packets from stat1004 towards carbon (the aggregator for eqiad).. could NOT confirm incoming packets on carbon (unlike from stat1003 and others)

looks like firewalling is different and related traffic is not allowed.

stat1003 for example has:

ACCEPT all -- anywhere anywhere state RELATED,ESTABLISHED

stat1004 does not have this rule.

how come ? we should assume both use base::firewall and are fully puppetized

i don't understand how analytics roles are setup. "role::analytics_cluster::client" includes a bunch of other things and the word "firewall" or "base::firewall" does not show up in any of this:

/puppet/modules/role/manifests/analytics_cluster$ ls
client.pp druid hive java.pp oozie refinery rsyncd.pp
database hadoop hue.pp monitoring README.md refinery.pp users.pp

Dzahn removed Dzahn as the assignee of this task.Aug 1 2016, 7:49 PM
Dzahn subscribed.
Dzahn claimed this task.

eh.. yea.. after looking more, i restarted all aggregators on carbon (as in "kill" them and run puppet)

stat1004 showed up

https://ganglia.wikimedia.org/latest/?c=Analytics%20cluster%20eqiad&h=stat1004.eqiad.wmnet&m=cpu_report&r=hour&s=descending&hc=4&mc=2

weird.. but that's happened before