A few kubernetes workers with persistent number of D procs:
Description
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Open | None | T403043 [ceph] 2025-08-27 ceph outage when bringing in a big osd host all at once (cloudcephosd1048) | |||
| Resolved | dcaro | T373632 CephSlowOps Ceph cluster in eqiad has slow ops, which might be blocking some writes | |||
| Resolved | • aborrero | T374692 toolforge: workers with many D procs (2024-09-13 edition) |
Event Timeline
Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-13T09:12:10Z] <aborrero@cloudcumin1001> START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-55, tools-k8s-worker-nfs-5, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-14 (T374692)
Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-13T09:12:14Z] <aborrero@cloudcumin1001> END (FAIL) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=99) for tools-k8s-worker-55, tools-k8s-worker-nfs-5, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-14 (T374692)
Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-13T09:20:55Z] <aborrero@cloudcumin1001> START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-55, tools-k8s-worker-nfs-5, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-14 (T374692)
Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-13T09:42:46Z] <aborrero@cloudcumin1001> END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-55, tools-k8s-worker-nfs-5, tools-k8s-worker-nfs-19, tools-k8s-worker-nfs-14 (T374692)
Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-13T11:13:18Z] <aborrero@cloudcumin1001> START - Cookbook wmcs.toolforge.k8s.reboot for tools-k8s-worker-nfs-54 (T374692)
Mentioned in SAL (#wikimedia-cloud-feed) [2024-09-13T11:18:51Z] <aborrero@cloudcumin1001> END (PASS) - Cookbook wmcs.toolforge.k8s.reboot (exit_code=0) for tools-k8s-worker-nfs-54 (T374692)
