In T193473, we migrated the wdqs-internal-scholarly and wdqs-internal-main from plaintext to TLS. Unfortunately, this caused a service interruption for some clients (see this Etherpad for more details). If we had monitoring for the upstream metrics, we would have caught this issue sooner.
Creating this ticket to:
- Create upstream error rate monitors for the services†
- Verify operation
†Probably something similar to this alert