https://wikitech.wikimedia.org/wiki/Catalyst/Incidents/2025-01-29 was caused by the OOM killer locking up the instance.
We should investigate environment resource limits as a means to prevent the OOM killer from getting triggered in the first place.
- Determine resource usage of a Catalyst wiki environment (possibly using ab)
- Try adding resource limits an environment (docs)
- Ensure resource limits are working and find the failure mode (what happens if the limit is reached from the user and admin perspective)