Page MenuHomePhabricator

Find tools.tools-services-01.sge.hosts.* metrics replacement
Closed, ResolvedPublic

Event Timeline

bd808 triaged this task as High priority.

Data collected by modules/toollabs/files/monitoring/sge.py which is a diamond collector provisioned by ::toollabs::services and profile::toolforge::services::basic. Some of the data collected is from the NFS data in /data/project/.system/gridengine (this needs to vary now based on grid) and some is from qstat and qconf calls. qconf will only work from from a grid admin node in the new grid, but the call that is being made can be replaced with qstat. I think this collector can move to the cron hosts on both grids and continue to work with a few code changes.

See T211684: Toolforge: Port sge.py stats to Prometheus for a different approach that we should probably take for the Stretch grid.

Change 485229 had a related patch set uploaded (by BryanDavis; owner: Bryan Davis):
[operations/puppet@production] toolforge: move SGE diamond collector to cronrunner on Trusty grid

https://gerrit.wikimedia.org/r/485229

Change 485229 merged by Bstorm:
[operations/puppet@production] toolforge: move SGE diamond collector to cronrunner on Trusty grid

https://gerrit.wikimedia.org/r/485229