Data stops unsuprisingly when tools-services-01 was shutdown.
Description
Details
Subject | Repo | Branch | Lines +/- | |
---|---|---|---|---|
toolforge: move SGE diamond collector to cronrunner on Trusty grid | operations/puppet | production | +6 -2 |
Related Objects
Event Timeline
Data collected by modules/toollabs/files/monitoring/sge.py which is a diamond collector provisioned by ::toollabs::services and profile::toolforge::services::basic. Some of the data collected is from the NFS data in /data/project/.system/gridengine (this needs to vary now based on grid) and some is from qstat and qconf calls. qconf will only work from from a grid admin node in the new grid, but the call that is being made can be replaced with qstat. I think this collector can move to the cron hosts on both grids and continue to work with a few code changes.
See T211684: Toolforge: Port sge.py stats to Prometheus for a different approach that we should probably take for the Stretch grid.
Change 485229 had a related patch set uploaded (by BryanDavis; owner: Bryan Davis):
[operations/puppet@production] toolforge: move SGE diamond collector to cronrunner on Trusty grid
Change 485229 merged by Bstorm:
[operations/puppet@production] toolforge: move SGE diamond collector to cronrunner on Trusty grid