On the balance of probabilities running out of heap space seems to be the limiting factor to scaling up the number of Wikis that are managed by our ElasticSearch cluster.
If we know how much heap we are using we can "add more" (by increasing the limit on the pod and the heap allocation) before we totally exhaust our resources.
During task breakdown we suggested a good implementation could be:
- Create a cronjob to log the heap percentage
- Create a log based metric to parse these log lines (first in the UI and then store this thing in TF in git)