Page MenuHomePhabricator

puppetdb Investigate the expected bahaviour of the edges table
Closed, ResolvedPublic

Description

the puppetdb table has almost doubled in the last ~10 months. We should investigate if this is expected or if we need to do some maintances. As part of this task it would also be useful to create some grafana metrics so we can track and plan for the growth of the puppetdb database

Event Timeline

jbond triaged this task as Medium priority.Jul 29 2021, 1:06 PM

Seems that th standard vacuum preformed by the autovacuum daemon dosn't reclaim space. Although I'm not sure how this is presented in postgresql

The standard form of VACUUM removes dead row versions in tables and indexes and marks the space available for future reuse. However, it will not return the space to the operating system, except in the special case where one or more pages at the end of a table become entirely free and an exclusive table lock can be easily obtained.

Further it may be worth preforming a full vacuum due to the heavy delete update behaviour on this table

Tip: Plain VACUUM may not be satisfactory when a table contains large numbers of dead row versions as a result of massive update or delete activity. If you have such a table and you need to reclaim the excess disk space it occupies, you will need to use VACUUM FULL, or alternatively CLUSTER or one of the table-rewriting variants of ALTER TABLE. These commands rewrite an entire new copy of the table and build new indexes for it. All these options require exclusive lock. Note that they also temporarily use extra disk space approximately equal to the size of the table, since the old copies of the table and indexes can't be released until the new ones are complete.

https://www.postgresql.org/docs/9.2/routine-vacuuming.html

This would require downtime and likely more disk space

jbond renamed this task from puppetdb Investigate the expected bahaviour of the faces table to puppetdb Investigate the expected bahaviour of the edges table.Jul 29 2021, 2:49 PM

i have done some digging on the on disk size vs the database size which i think shows how much data we could potentially reclaim by doing a full vacume. however please be warned at this point most of what i know of postgresql i learnt over the last two days :). See below for full results but i think the following are worth pointing out

relationdisk sizesize
public.idx_factsets_jsonb_merged9095 MB9097 MB
public.catalog_resources2154 MB5301 MB
public.edges2135 MB7931 MB
  • idx_factsets_jsonb_merged: from the name this sounds like its and index and i suspect it is not very good for performance if the index cant fit in ram (psql only has 8G effective_cache_size)
  • catalog_resources and edges. both of theses have a large discrepancy between the on-disk size and the db size. if i read this right it means we could reclaim ~9G by doing a full vacum on these two tables
    • resource_params, fact_paths and resource_events also seem to have similar discrepancies
SELECT nspname || '.' || relname AS "relation",
      pg_size_pretty(pg_relation_size(C.oid)) AS "disk size", 
      pg_size_pretty( pg_total_relation_size(nspname || '.' || relname)) AS "size" 
    FROM pg_class C
    LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
   WHERE nspname IN ('public')
    ORDER BY pg_relation_size(C.oid) DESC;
relationdisk sizesize
public.idx_factsets_jsonb_merged9095 MB9097 MB
public.edges_certname_source_target_type_unique_key5794 MB5795 MB
public.catalog_resources2154 MB5301 MB
public.edges2135 MB7931 MB
public.fact_paths_path_trgm1311 MB1311 MB
public.catalog_resources_encode_idx922 MB922 MB
public.resource_params_hash_expr_idx748 MB748 MB
public.catalog_resources_type_title_idx649 MB649 MB
public.catalog_resources_resource_idx559 MB559 MB
public.resource_params547 MB2761 MB
public.catalog_resources_type_idx521 MB521 MB
public.resource_params_pkey511 MB511 MB
public.catalog_resources_pkey1490 MB491 MB
public.idx_resources_params_name480 MB480 MB
public.idx_resources_params_resource421 MB421 MB
public.resource_params_cache305 MB514 MB
public.reports174 MB281 MB
public.resource_events_resource_timestamp113 MB113 MB
public.resource_events_unique93 MB93 MB
public.fact_paths83 MB1533 MB
public.resource_params_cache_parameters_idx78 MB78 MB
public.fact_paths_path_type_unique78 MB78 MB
public.resource_events_resource_title_idx62 MB62 MB
public.rpc_hash_expr_idx51 MB51 MB
public.fact_paths_name32 MB32 MB
public.fact_paths_pkey29 MB29 MB
public.resource_events27 MB382 MB
public.resource_params_cache_pkey25 MB25 MB
public.resource_events_containing_class_idx19 MB19 MB
public.resource_events_timestamp_idx17 MB17 MB
public.resource_events_reports_id_idx17 MB17 MB
public.resource_events_property_idx17 MB17 MB
public.resource_events_status_idx17 MB17 MB
public.reports_hash_expr_idx11 MB11 MB
public.reports_tx_uuid_expr_idx9608 kB9608 kB
public.idx_reports_compound_id9056 kB9080 kB
public.idx_reports_producer_timestamp_by_hour_certname7472 kB7496 kB
public.reports_certname_idx6808 kB6832 kB
public.reports_catalog_uuid_idx4808 kB4832 kB
public.catalog_resources_exported_idx4552 kB4576 kB
public.reports_status_id_idx3456 kB3480 kB
public.idx_reports_prod3432 kB3456 kB
public.reports_environment_id_idx3384 kB3408 kB
public.idx_reports_producer_timestamp2720 kB2744 kB
public.reports_end_time_idx2720 kB2744 kB
public.factsets2664 kB9146 MB
public.reports_pkey2648 kB2672 kB
public.catalogs1040 kB3688 kB
public.factsets_hash_expr_idx584 kB584 kB
public.idx_factsets_prod520 kB544 kB
public.idx_certnames_latest_report_timestamp512 kB536 kB
public.idx_certnames_latest_report_id472 kB496 kB
public.idx_catalogs_env464 kB488 kB
public.idx_catalogs_prod464 kB488 kB
public.catalogs_hash_expr_idx408 kB408 kB
public.idx_catalogs_producer_timestamp400 kB424 kB
public.certnames392 kB2016 kB
public.catalogs_tx_uuid_expr_idx328 kB328 kB
public.certnames_transform_certname_key320 kB344 kB
public.catalogs_certname_idx272 kB296 kB
public.factsets_certname_idx264 kB264 kB
public.certnames_transform_pkey208 kB208 kB
public.catalogs_pkey168 kB168 kB
public.factsets_pkey160 kB160 kB
public.producers_pkey16 kB16 kB
public.idx_reports_noop_pending16 kB16 kB
public.producers_name_key16 kB16 kB
public.report_statuses_status_key16 kB16 kB
public.schema_migrations_pkey16 kB16 kB
public.report_statuses_pkey16 kB16 kB
public.value_types_pkey16 kB16 kB
public.reports_noop_idx16 kB16 kB
public.packages_name_trgm16 kB16 kB
public.environments_pkey16 kB16 kB
public.environments_name_key16 kB16 kB
public.package_hash_key8192 bytes8192 bytes
public.certname_package_reverse_idx8192 bytes8192 bytes
public.packages_name_idx8192 bytes8192 bytes
public.fact_paths_id_seq8192 bytes8192 bytes
public.report_statuses_id_seq8192 bytes8192 bytes
public.report_statuses8192 bytes48 kB
public.value_types8192 bytes24 kB
public.catalogs_transform_id_seq18192 bytes8192 bytes
public.reports_cached_catalog_status_on_fail8192 bytes8192 bytes
public.reports_job_id_idx8192 bytes8192 bytes
public.catalogs_job_id_idx8192 bytes8192 bytes
public.certname_id_seq8192 bytes8192 bytes
public.fact_values_id_seq8192 bytes8192 bytes
public.environments8192 bytes48 kB
public.schema_migrations8192 bytes24 kB
public.resource_events_status_for_corrective_change_idx8192 bytes8192 bytes
public.catalogs_id_seq8192 bytes8192 bytes
public.environments_id_seq8192 bytes8192 bytes
public.reports_id_seq8192 bytes8192 bytes
public.producers_id_seq8192 bytes8192 bytes
public.factsets_id_seq8192 bytes8192 bytes
public.producers8192 bytes48 kB
public.packages_pkey8192 bytes8192 bytes
public.certname_packages_pkey8192 bytes8192 bytes
public.package_id_seq8192 bytes8192 bytes
public.certname_packages0 bytes16 kB
public.packages0 bytes48 kB
fgiunchedi subscribed.

I'm optimistically pulling o11y since AFAICS there's no actionabled

jbond claimed this task.

closing this we have hopefully made it past the puppetdb issues