Page MenuHomePhabricator

cloud-vps puppetservers filling up / with puppetserver reports
Closed, ResolvedPublic

Description

When looking at a failed fullstack VM, I see this error in the logs:

Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: No space left on device - /var/lib/puppetserver/server_data/facts/fullstackd-20240530223742.admin-monitoring.eqiad1.wikimedia.cloud.json

Indeed, / is full on the puppetserver, due to a huge number of files in /var/lib/puppetserver/reports.

We need to either stop generating those reports entirely, or purge them periodically to keep our puppetservers from degrading.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Change #1037812 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] cloud-vps: turn off report storage on cloud-vps puppetservers

https://gerrit.wikimedia.org/r/1037812

Change #1037812 merged by Andrew Bogott:

[operations/puppet@production] cloud-vps: turn off puppet report storage on cloud-vps puppetservers

https://gerrit.wikimedia.org/r/1037812

Change #1038381 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] puppetserver 'report' enum: allow 'none' as a value

https://gerrit.wikimedia.org/r/1038381

Change #1038381 merged by Andrew Bogott:

[operations/puppet@production] puppetserver 'report' enum: allow 'none' as a value

https://gerrit.wikimedia.org/r/1038381

Change #1038387 had a related patch set uploaded (by Andrew Bogott; author: Andrew Bogott):

[operations/puppet@production] puppetdb: Remove 'none' from puppet reports config when adding 'puppetdb'

https://gerrit.wikimedia.org/r/1038387

Change #1038387 merged by Andrew Bogott:

[operations/puppet@production] puppetdb: Remove 'none' from puppet reports config when adding 'puppetdb'

https://gerrit.wikimedia.org/r/1038387

@Andrew Doesn't the PCC updater rely on the facts report directory? Also there should be a systemd timer cleaning up old reports to free up space.

@Andrew Doesn't the PCC updater rely on the facts report directory? Also there should be a systemd timer cleaning up old reports to free up space.

The merged duplicate task was changing that timer to run every hour instead of the default 8h (in case the reports are still needed).

And https://gerrit.wikimedia.org/r/c/operations/puppet/+/1037812 was merged yesterday disabling the reports entirely?

Yes, as we were not aware they are needed for pcc (I wasn't at least).