Page MenuHomePhabricator

Some httpbb checks are flapping
Closed, DuplicatePublic

Description

As per IRC chat, opening a task to investigate next week as some httpbb checks have flapped in the past days, and in the last two days in particular the kubernetes ones have flapped more than usual:

2023-05-03 20:29:57 jinxer-wm| (SystemdUnitFailed) firing: (3) httpbb_hourly_appserver.service Failed on cumin2002:9100
2023-05-03 20:48:35 jinxer-wm| (SystemdUnitFailed) firing: (3) httpbb_hourly_appserver.service Failed on cumin2002:9100
2023-05-03 20:53:31 jinxer-wm| (SystemdUnitFailed) firing: (3) httpbb_hourly_appserver.service Failed on cumin2002:9100
2023-05-04 20:37:42 jinxer-wm| (SystemdUnitFailed) firing: httpbb_kubernetes_mw-web_hourly.service Failed on cumin1001:9100
2023-05-04 21:37:42 jinxer-wm| (SystemdUnitFailed) resolved: httpbb_kubernetes_mw-web_hourly.service Failed on cumin1001:9100
2023-05-09 03:31:42 jinxer-wm| (SystemdUnitFailed) firing: (3) httpbb_hourly_appserver.service Failed on cumin1001:9100
2023-05-09 04:31:42 jinxer-wm| (SystemdUnitFailed) firing: (3) httpbb_hourly_appserver.service Failed on cumin1001:9100
2023-05-11 08:06:42 jinxer-wm| (SystemdUnitFailed) firing: httpbb_kubernetes_mw-web_hourly.service Failed on cumin2002:9100
2023-05-11 09:06:42 jinxer-wm| (SystemdUnitFailed) resolved: httpbb_kubernetes_mw-web_hourly.service Failed on cumin2002:9100
2023-05-12 09:06:42 jinxer-wm| (SystemdUnitFailed) firing: httpbb_kubernetes_mw-web_hourly.service Failed on cumin2002:9100
2023-05-12 10:06:42 jinxer-wm| (SystemdUnitFailed) resolved: httpbb_kubernetes_mw-web_hourly.service Failed on cumin2002:9100
2023-05-12 11:06:42 jinxer-wm| (SystemdUnitFailed) firing: httpbb_kubernetes_mw-web_hourly.service Failed on cumin2002:9100
2023-05-12 12:06:42 jinxer-wm| (SystemdUnitFailed) resolved: httpbb_kubernetes_mw-web_hourly.service Failed on cumin2002:9100
2023-05-12 13:06:42 jinxer-wm| (SystemdUnitFailed) firing: httpbb_kubernetes_mw-api-ext_hourly.service Failed on cumin2002:9100
2023-05-12 14:06:42 jinxer-wm| (SystemdUnitFailed) resolved: httpbb_kubernetes_mw-api-ext_hourly.service Failed on cumin2002:9100

Event Timeline

Volans triaged this task as Medium priority.May 12 2023, 4:52 PM
Volans created this task.
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Thanks for the task and sorry for the slow reply -- per @Clement_Goubert's merge, these failures are concurrent with deployments. We could paper over them with retries on the httpbb side, but we think it's not an httpbb problem; rather, the failures represent real unavailability windows associated with the deploy, and that isn't supposed to happen. We'll continue digging over on the other task.