Project Name: wmgmc_monitoring
Developer account usernames of requestors: @XtexChooser @Yiming
Purpose: It will be used for observability infrastructures of WMGMC Technical Group.
Brief description:
As there are more and more services running by WMGMC Tech, it becomes important than we can monitor our services easily and get notified when some of them are broken.
After an internal tech RFC (https://issues.cnuser.wiki/T20), we selected Cloud VPS to host our observability infrastructures, including OTel Collector (opentelemetry-collector-contrib), Prometheus and Grafana OSS.
Previously we planned to run a simple uptime monitor like Gatus or Uptime Kuma but soon realized that we will miss lots of important metrics which may truly expose problems.
How soon you are hoping this can be fulfilled: this month before June.