From alert:
TektonDown project: tools 1 description summary: Tekton is down 17 minutes agoinstance: k8s.tools.eqiad1.wikimedia.cloud:6443 service: toolforge,build_service,tekton source: prometheus team: wmcs @cluster: wmcloud.org @receiver: metricsinfra_cloud-feed runbook
at https://alerts.wikimedia.org/?q=team%3Dwmcs
Tekton seems up and running:
root@tools-k8s-control-5:~# kubectl get all -n tekton-pipelines NAME READY STATUS RESTARTS AGE pod/tekton-pipelines-controller-5c78ddd49b-z6pm2 1/1 Running 0 15d pod/tekton-pipelines-webhook-5d899cc8c-kk9hf 1/1 Running 0 17d NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/tekton-pipelines-controller ClusterIP 10.110.221.64 <none> 9090/TCP,8008/TCP,8080/TCP 64d service/tekton-pipelines-webhook ClusterIP 10.105.112.2 <none> 9090/TCP,8008/TCP,443/TCP,8080/TCP 64d NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/tekton-pipelines-controller 1/1 1 1 64d deployment.apps/tekton-pipelines-webhook 1/1 1 1 64d NAME DESIRED CURRENT READY AGE replicaset.apps/tekton-pipelines-controller-5c78ddd49b 1 1 1 64d replicaset.apps/tekton-pipelines-webhook-5d899cc8c 1 1 1 64d NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE horizontalpodautoscaler.autoscaling/tekton-pipelines-webhook Deployment/tekton-pipelines-webhook 4%/100% 1 5 1 64d
Looking at the cert in the prometheus machine, it expired:
root@tools-prometheus-6:/srv/prometheus/tools# openssl x509 -in /etc/ssl/localcerts/toolforge-k8s-prometheus.crt -text root@tools-prometheus-6:/srv/prometheus/tools# openssl x509 -in /etc/ssl/localcerts/toolforge-k8s-prometheus.crt -text Certificate: ... Validity Not Before: Jun 2 11:55:07 2022 GMT Not After : Jun 2 11:55:07 2023 GMT