Page MenuHomePhabricator

Collect and display basic metrics for all tools (service groups)
Open, MediumPublic


Track basic metrics for each service group:

  • cpu hours used (I think we can get this from qacct)
  • disk space used (du -s)
  • database usage (number of rows in service group db)
  • number of raw hits to

Provide aggregate reports and reports per service group with daily granularity.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
chasemp triaged this task as Medium priority.Apr 4 2016, 2:00 PM

One use of this would be proactively monitoring for large databases like the ones that are being looked at in T132431: labsdb1001 and labsdb1003 short on available space.

A big hammer method for checking user/tool database sizes:

  , sum( data_length ) as data_bytes
  , sum( index_length ) as index_bytes
  , sum( table_rows ) as row_count
  , count(1) as tables
FROM information_schema.TABLES
WHERE table_schema regexp '^[psu][0-9]'
GROUP BY table_schema;

Something could run that once per day (or maybe even once a week) aon each distinct database host for labs and log the data in a way that could be used to produce nice timeseries reports.

On 2016-04-21 there are 223 distinct user/tool schemas on tools.labsdb. Other hosts have far fewer (c1=58, c3=35). now provides point-in-time database usage information.