Page MenuHomePhabricator

Request for more compute and storage for the GLAMS dashboard project
Closed, ResolvedPublic

Description

Hello,

right now the GLAMS dashboard project is struggling, mainly due to lack of resources.
Is it possible to increase the quota for the following:

DB as a service (we use postgresql), currently sits at 510 GB of disk space, increase to 1TB.

Compute - we currently have a quota for 16GB of RAM and 8 vCPUs
We have 2 machines:

  • Web server, uses 4GB of RAM and 2 vCPUs. Works fine and does not require any more resources.
  • "Services" instance which is used for aggregating and processing wiki's data, currently sits at 8GB of RAM and 4 vCPUS. I would love to have it increased to 8 vCPUs and 32GB of RAM, but any increase possible is welcome, since this machine is really struggling under the load.

To sum it up - the request to increase the quote to a total of 36GB of RAM and 10 vCPUs.

Cinder blocks (volumes):

Currently we have 80GB. Would love to increase it to at least 1TB so we can backup the DB using pg_dumpall.

Thanks a lot.

Event Timeline

May I ask where the 1TB figure comes from? After T355138#9582807 the database has been using about 200-300G of the 500G disk space it's currently allocated.

Similarly, for the services instance, our monitoring is showing barely any load on the instance most of the time and peaks of up to a single CPU core of usage, far from the 4 vCPUs currently allocated:

image.png (317×1 px, 32 KB)

The DB's disk already got filled up once when it had 500GB. It's also growing every day and we want to add even more institutions to the dashboard. The more disk space we have, the more we institutions we can add and less cautious we have to be.

Regarding the CPU usage:

  1. We are at almost 100% RAM usage. In order to add RAM we have to resize the instance to a better flavor. Flavors with more RAM have also more vCPUs, so the increase has to be of both.
  2. We only use 1 core at the moment because we cannot run any other processes (because we'll be out of RAM), we have paused all addition of new institutions to the dashboard. Once we have enough RAM we can run processes in parallel. We can start adding institutions while also doing it faster (right now it usually takes 2 weeks to add a new batch).

The DB's disk already got filled up once when it had 500GB.

If I'm reading T355138 correctly, the disk filled 500GB only because of a Postgres WAL file that has since been deleted. I would suggest keeping the limit at 500 for now and monitor the increase over the coming days/week.

I think increasing volume space from 80GB to 500 GB is reasonable for backup purposes, and also increasing CPUs from 8 to 10 to allow for instance flavors with more RAM.

Alright, since the next compute flavor has 16GB of RAM I'll need an increase of 4 GBs of memory as well.
Thanks a lot

Mentioned in SAL (#wikimedia-cloud-feed) [2024-03-05T16:41:11Z] <fnegri@cloudcumin1001> START - Cookbook wmcs.openstack.quota_increase (T358477)

Mentioned in SAL (#wikimedia-cloud-feed) [2024-03-05T16:41:20Z] <fnegri@cloudcumin1001> END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) (T358477)

fnegri claimed this task.