Page MenuHomePhabricator

Collect and display basic metrics for all tools (service groups)
Open, NormalPublic


Track basic metrics for each service group:

  • cpu hours used (I think we can get this from qacct)
  • disk space used (du -s)
  • database usage (number of rows in service group db)
  • number of raw hits to

Provide aggregate reports and reports per service group with daily granularity.

Event Timeline

bd808 created this task.Mar 11 2016, 6:23 AM
Restricted Application added a project: Cloud-Services. · View Herald TranscriptMar 11 2016, 6:23 AM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
bd808 moved this task from Backlog to Ready on the Community-Tech-Tool-Labs board.Mar 25 2016, 2:14 AM
chasemp triaged this task as Normal priority.Apr 4 2016, 2:00 PM
bd808 added a comment.Apr 21 2016, 8:09 PM

One use of this would be proactively monitoring for large databases like the ones that are being looked at in T132431: labsdb1001 and labsdb1003 short on available space.

bd808 added a comment.EditedApr 22 2016, 12:12 AM

A big hammer method for checking user/tool database sizes:

  , sum( data_length ) as data_bytes
  , sum( index_length ) as index_bytes
  , sum( table_rows ) as row_count
  , count(1) as tables
FROM information_schema.TABLES
WHERE table_schema regexp '^[psu][0-9]'
GROUP BY table_schema;

Something could run that once per day (or maybe even once a week) aon each distinct database host for labs and log the data in a way that could be used to produce nice timeseries reports.

On 2016-04-21 there are 223 distinct user/tool schemas on tools.labsdb. Other hosts have far fewer (c1=58, c3=35).

scfc moved this task from Triage to Backlog on the Toolforge board.Dec 4 2016, 8:57 PM
bd808 added a comment.Oct 14 2017, 5:35 AM now provides point-in-time database usage information.

Harej added a subscriber: Harej.Jan 24 2018, 11:30 PM