part of making sure that everything in the MW pipeline is monitored in beta as well.
Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Open | None | T53494 Use Beta cluster as a true canary for code deployments (epic) | |||
Stalled | None | T53497 Setup monitoring for Beta Cluster (tracking) | |||
Stalled | None | T87093 Setup monitoring for database servers in beta cluster |
Event Timeline
From T97120
The beta cluster MySQL servers turned out to be down for a few hours (T96905) and there is no monitoring for it.
We would need on both instances (deployment-db1 and deployment-db2) a check to ensure the mysql process is running.
The command line looks like:
/usr/sbin/mysqld --basedir=/usr --datadir=/mnt/sqldata \ --plugin-dir=/usr/lib/mysql/plugin --user=mysql \ --log-error=/mnt/sqldata/deployment-db1.err \ --pid-file=/mnt/sqldata/deployment-db1.pid \ --socket=/tmp/mysql.sock --port=3306
I guess we can just monitor whether /usr/bin/mysqld is present.
Per beta cluster weekly triage:
The MySQL databases only got down a couple times over 4 years and we quickly noticed it when it happened. Lack of monitoring is surely annoying but is not that much of a big deal, hence lowering priority.
The previous comments don't explain what/who exactly this task is stalled on ("If a report is waiting for further input (e.g. from its reporter or a third party) and can currently not be acted on"). Hence resetting task status.
If this task should not be worked on and fixing this is not worth the efforts, then task status should have the "Declined" status.)