Page MenuHomePhabricator

ms-fe.svc.codfw.wmnet paged during Swift rebalance
Closed, ResolvedPublic

Description

Followup task for https://wikitech.wikimedia.org/wiki/Incident_documentation/20210201-swift-codfw

Actionables:

  • Change /monitoring/backend to /monitoring/frontend (i.e. check the frontend itself) for icinga service check and pybal's proxyfetch
  • Consider depooling swift's discovery records during rebalances
  • Consider lowering DNS TTL for Swift's discovery record

Event Timeline

Change 661369 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] hieradata: use /monitoring/frontend for Swift's internal svc health checks

https://gerrit.wikimedia.org/r/661369

Change 661369 merged by Filippo Giunchedi:
[operations/puppet@production] hieradata: use /monitoring/frontend for Swift's internal svc health checks

https://gerrit.wikimedia.org/r/661369

fgiunchedi claimed this task.

Resolving, the other two items should be either done or tackled elsewhere.

Specifically: depooling swift during rebalances isn't needed anymore after having tackled T221904: swift backend decomms / rebalances are noisy and discovery records TTLs are all 300s and I think should be handled/changed consistently together.