Blackbox monitoring of hcaptcha reachability from our infra could be used for quickly diagnosing "outages that aren't our problem" and silencing other alerts.
Description
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| In Progress | None | T410626 WE6.2.6: ☂️ hcaptcha-proxy Production Readiness Review | |||
| Open | Raine | T411255 hcaptcha-proxy monitoring: Monitor hcaptcha.com (i.e. upstream) status to distinguish "outages that aren't our problem" |
Event Timeline
The monitoring created in T404204 is sufficient -- see https://grafana-rw.wikimedia.org/d/441b2def-52e9-49d6-acad-91f5bb748989/hcaptcha-reverse-proxy-proxoid?viewPanel=panel-36 . Further work is not needed.
The extension does not distinguish between hCaptcha being unreachable due to the upstream being gone vs our proxy being gone. Therefore, I think that having blackbox monitoring of hcaptcha.com that does not go through our proxy would still have additional value. @jijiki let me know if I'm wrong :-)
@Raine I think the title could match better the description, could you please rename it?
We already have a degree of blackbox monitoring for all services by default. This task is about the upstream, i.e. hcaptcha.com status, because it is an external dependency. Our other services don't really have this kind of external dependencies.
I'm not sure what exactly you mean, so I just expanded it a little.