Change retention values for Graphite metrics
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Peter
	Oct 6 2021, 6:34 AM

Description

We should store synthetic testing metrics longer than we do today. The crux metrics we collect we store for 2 years, but for our own metrics we save them only for 33 days. That should at least be one quarter (so we can compare the beginning of the quarter with the end) or even better one full year.

We also have a minimum retention of 10 minutes but we do not run tests that often, I think we should change that to 1 hour and the we can setup specific rules if we need to run tests more often.

Details

Subject	Repo	Branch	Lines +/-
Set retention to 1 hour for all tests.	performance/synthetic-monitoring-tests	master	+8 -8
Run tests once an hour.	performance/synthetic-monitoring-tests	master	+1 -1
Increase retention time to 1 hour.	performance/synthetic-monitoring-tests	master	+6 -6

Customize query in gerrit

Related Objects

Mentioned In: rPSMT6c690115b441: Set retention to 1 hour for all tests.
rPSMT59c201370281: Run tests once an hour.
T293327: Remove unused WebPageTest and WebPageReplay tests

Event Timeline

Peter created this task.Oct 6 2021, 6:34 AM

Peter moved this task from Inbox, needs triage to To-do: Goals, prioritized next 4 Quarters on the Performance-Team board.Oct 12 2021, 11:29 AM

Peter mentioned this in T293327: Remove unused WebPageTest and WebPageReplay tests.Oct 14 2021, 7:34 AM

I've cleaned up all old data and the disk is 8% full at the moment. I'll add some new rules and do the change tool by tool to see what the total size will be.

Change 732281 had a related patch set uploaded (by Phedenskog; author: Phedenskog):

[performance/synthetic-monitoring-tests@master] Increase retention time to 1 hour.

https://gerrit.wikimedia.org/r/732281

gerritbot added a project: Patch-For-Review.Oct 20 2021, 10:54 AM

I've made a test run on the server by copying on of the nodes of data into another structure and then running:

find . -type f -name "*.wsp" -exec whisper-resize.py --aggregate --nobackup {} 1h:1y \;

And then moving it back to a temp folder, and looking at the data in Grafana. The data looks good. The thing is that old annotations that link to the result, will not match the exact Graphite data point, but I think that is ok as long as the new ones are perfect.

It takes some time to run, so I'm gonna do it in many small steps, stopping whisper, change the data, change the config, turning everything again, verify and then do it again.

Two more things: we only keep the result HTML data for a month but maybe that is ok (or at least for now). Also when I tested I configure it to store data for 1 year but probably should have 13 months (or 14) so you can go back comparing January with January etc.

New day, new plan. Lets start will all the desktop tests and deploy that change and see how long time it will take.

Change 732281 abandoned by Phedenskog:

[performance/synthetic-monitoring-tests@master] Increase retention time to 1 hour.

Reason:

Lets do this step by step instead

https://gerrit.wikimedia.org/r/732281

Change 732907 had a related patch set uploaded (by Phedenskog; author: Phedenskog):

[performance/synthetic-monitoring-tests@master] Run tests once an hour.

https://gerrit.wikimedia.org/r/732907

Change 732907 merged by jenkins-bot:

[performance/synthetic-monitoring-tests@master] Run tests once an hour.

https://gerrit.wikimedia.org/r/732907

Maintenance_bot removed a project: Patch-For-Review.Oct 22 2021, 7:10 AM

Change 732929 had a related patch set uploaded (by Phedenskog; author: Phedenskog):

[performance/synthetic-monitoring-tests@master] Set retention to 1 hour for all tests.

https://gerrit.wikimedia.org/r/732929

gerritbot added a project: Patch-For-Review.Oct 22 2021, 9:41 AM

Change 732929 merged by jenkins-bot:

[performance/synthetic-monitoring-tests@master] Set retention to 1 hour for all tests.

https://gerrit.wikimedia.org/r/732929

Maintenance_bot removed a project: Patch-For-Review.Oct 22 2021, 10:10 AM

This is done but I need to update the documentation and cleanup the backup files.

Documentation updated and backup files removed.

Peter mentioned this in rPSMT59c201370281: Run tests once an hour..Mon, Apr 8, 3:15 AM

Peter mentioned this in rPSMT6c690115b441: Set retention to 1 hour for all tests..

Change retention values for Graphite metricsClosed, ResolvedPublicActions

Description

Details

Related Objects

Event Timeline

Change retention values for Graphite metrics
Closed, ResolvedPublic
Actions