Page MenuHomePhabricator

Create prometheus metrics for Maps OSM data disk usage
Closed, ResolvedPublic

Description

In order to track OSM replication disk usage issues, it would be good to have some metrics regarding PostgreSQL disk usage. The metrics that I would like to display in grafana should reflect the following command and outputs:

Disk usage of postgres data folder

postgres@maps2004:/srv/postgresql/9.6/main$ du -sh *
902G    base
524K    global
7.3M    pg_clog
4.0K    pg_commit_ts
4.0K    pg_dynshmem
8.0K    pg_hba.conf
4.0K    pg_ident.conf
12K     pg_logical
28K     pg_multixact
12K     pg_notify
4.0K    pg_replslot
4.0K    pg_serial
4.0K    pg_snapshots
4.0K    pg_stat
52K     pg_stat_tmp
140K    pg_subtrans
4.0K    pg_tblspc
4.0K    pg_twophase
4.0K    PG_VERSION
13G     pg_xlog
4.0K    postgresql.auto.conf
24K     postgresql.conf
4.0K    postmaster.opts
4.0K    postmaster.pid

Querying DB relations and getting index and table sizes

postgres@maps2004:/srv/postgresql/9.6/main$ psql -d gis -c "SELECT TABLE_NAME, index_bytes, toast_bytes, total_bytes-index_bytes-COALESCE(toast_bytes,0) AS table_bytes FROM (
>       SELECT relname AS TABLE_NAME
>               , pg_total_relation_size(c.oid) AS total_bytes
>               , pg_indexes_size(c.oid) AS index_bytes
>               , pg_total_relation_size(reltoastrelid) AS toast_bytes
>           FROM pg_class c
>           LEFT JOIN pg_namespace n ON n.oid = c.relnamespace
>           WHERE relkind = 'r' AND nspname = 'public'
>   ) a;"

     table_name     | index_bytes  | toast_bytes | table_bytes
--------------------+--------------+-------------+--------------
 spatial_ref_sys    |       196608 |        8192 |      4456448
 planet_osm_nodes   |         8192 |             |            0
 planet_osm_polygon |  64203743232 | 15467462656 | 127622037504
 planet_osm_roads   |   2222784512 |  4028407808 |   8880103424
 planet_osm_ways    | 322910265344 |  1334394880 | 153855852544
 planet_osm_point   |  10463854592 |     3063808 |  19741638656
 water_polygons     |      1114112 |   902766592 |      7602176
 admin              |    102449152 |   601923584 |    804020224
 planet_osm_rels    |   2807840768 |   294150144 |   3507986432
 planet_osm_line    |  29798989824 | 11358765056 |  70170394624
(10 rows)

Open questions

  • Is it possible to get these metrics out of the box in the current Grafana dashboard?
  • What metrics are worth keeping?

Event Timeline

MSantos renamed this task from Create prometheus metrics for Postgres disk usage to Create prometheus metrics for Maps OSM data disk usage.Mar 30 2020, 1:54 PM

We have https://github.com/wrouesnel/postgres_exporter deployed on the maps hosts, I believe some/all of the metrics you are looking for are available in grafana/prometheus. You can get a preview of those from the host itself if you wish: curl -s localhost:9187/metrics or use Grafana's "explore" function (while logged in), the metrics will start with pg_, hope that helps!

MSantos triaged this task as High priority.Apr 1 2020, 2:58 PM

@fgiunchedi and @Gehel I couldn't find the specific metrics in the postgres exporter metrics so I'm thinking about scripts to collects the metrics above. You can see the new grafana dashboard I am creating for this ticket here.

I started a patch here but need some help to go beyond that.

LGoto lowered the priority of this task from High to Medium.Apr 21 2020, 3:37 PM
LGoto moved this task from Upcoming to Backlog on the Product-Infrastructure-Team-Backlog board.

I started a patch here but need some help to go beyond that.

Do you still need help here? I think you only need some tweaks over there. However, using prom files is usually slightly painful over time.

It's not particularly difficult to change the exporter upstream, I can volunteer to patch it if still needed.
https://github.com/prometheus-community/postgres_exporter/blob/master/queries.yaml
https://github.com/prometheus-community/postgres_exporter/blob/master/cmd/postgres_exporter/postgres_exporter.go

@Nemo_bis it would be great to have some help with it. I'm subscribing to more people that would be interested in evaluating the acceptance criteria of this ticket since it has been a long time since its creation.

Change 666888 had a related patch set uploaded (by Hnowlan; owner: Hnowlan):
[operations/puppet@production] prometheus::postgres_exporter: disk metrics and custom queries

https://gerrit.wikimedia.org/r/666888

Change 666888 merged by Hnowlan:
[operations/puppet@production] prometheus::postgres_exporter: disk metrics and custom queries

https://gerrit.wikimedia.org/r/666888

Change 667645 had a related patch set uploaded (by Hnowlan; owner: Hnowlan):
[operations/puppet@production] prometheus::postgres_exporter: Load additional rules on stretch

https://gerrit.wikimedia.org/r/667645

Change 667645 merged by Hnowlan:
[operations/puppet@production] prometheus::postgres_exporter: Load additional rules on stretch

https://gerrit.wikimedia.org/r/667645

All maps hosts are now returning these metrics:

hnowlan@maps1006:~$ curl -s localhost:9187/metrics| grep pg_database
# HELP pg_database_size_bytes Disk space used by the database
# TYPE pg_database_size_bytes gauge
pg_database_size_bytes{datname="gis"} 1.035251874328e+12
pg_database_size_bytes{datname="postgres"} 7.248408e+06
pg_database_size_bytes{datname="template0"} 7.135748e+06
pg_database_size_bytes{datname="template1"} 7.248408e+06