After discovering a hole in k8s apiserver metrics, @fgiunchedi and I investigated and found that new pki certs had been deployed to prometheus but never picked up, and expired certificats were used, resulting in 401 answered queries for metrics.
Smoking gun from kube-apiserver:
Aug 04 12:34:46 kubemaster1001 kube-apiserver: E0804 12:34:46.650786 152161 authentication.go:63] "Unable to authenticate the request" err="[x509: certificate has expired or is not yet valid: current time 2023-08-04T12:34:46Z is after 2023-08-02T08:44:00Z, verifying certificate SN=701251950718436174693962379298597088894617122879, SKID=5F:4D:28:59:E7:F3:A7:B3:9B:9F:F7:65:A0:44:C4:39:BE:A1:82:85, AKID=06:94:D5:26:9E:07:DF:85:0D:DF:92:AC:80:03:53:CC:88:A3:EC:49 failed: x509: certificate has expired or is not yet valid: current time 2023-08-04T12:34:46Z is after 2023-08-02T08:44:00Z]"
A simple reload didn't fix it, so a restart of both prometheus@k8s instances in eqiad was done.
12:32:26 godog │ !log bounce prometheus@k8s on prometheus100 to test failure to reload certs
Prometheus should restart on a new certificate deployment, or at least alert on unhealthy jobs caused by 401s.