Summary
On page load, we need to know if hCaptcha is available. We do that by checking if the secure-api.js file ($wgHCaptchaApiUrl) is available. We have 167 instances over the last week where the error threshold exceeded what we tolerate for a given window of time, which resulted in us switch back to FancyCaptcha.
Observations
- We retry the API URL download once, without any delay in between first attempt and retry. Should we add a delay? Should we retry more than once?
- "hCaptcha unavailable due to apiUrl errors:" appears within the same couple of seconds on enwiki (three times), zhwiki (twice) and fawiki (once). In theory this should be a global check, and should just need to be calculated on one wiki (and have the result shared across all wikis). So something may be off with our cache implementation.
- If we could self-host the secure-api.js code (T403829), we should be in much better shape to handle transient network errors, because our secureEnclave.js code already supports retries due to network issues. (There might be a bit more work to do, though.) But if the network link between the proxy and hCaptcha is having issues, then we're put in a risky position of saying that hCaptcha is available when it in fact isn't, which means edits/acocunt creations don't go through.
- Self-hosting was problematic from a proprietary code point of view, but perhaps we could cache the contents of secure-api.js in memcache for a long period of time?
Acceptance criteria
- We do not have more than one failover incident per month