To prevent future incidents like the recent Nginx crash (caused by Docker memory being full), we need to set up monitoring and alerts for memory usage on the server.
Action Items:
- Configure memory usage alerts for Docker containers and host system.
- Set thresholds for warning and critical levels.
- Integrate alerts with our monitoring/notification system (e.g., Telegram bot, email, etc.).
- Optionally, set up automatic cleanup for unused containers/images if feasible.
This will help in early detection and proactive handling of memory-related issues.
Related INC: T396906: Error! × Server unavailable or you are offline