Page MenuHomePhabricator

db1080-95 batch possibly suffering BBU issues
Closed, DeclinedPublic

Description

Recently we had 2 BBU-related crashes for hosts in this batch:

Those crashes resulted on the hosts rebooting themselves. This is not something new for HP hosts unfortunately, and with the previous batch of hosts that were already decommissioned we suffered that when they were approaching their decommissioning time - examples: T160731 T159266

Those hosts were bought as part of the batch db1080-95 T131368 and they are going to be refreshed in Q2 (T258361).

The current roles for the servers from that batch are:

db1080 - m1 master
db1081 - s4 master
db1082 - s5 slave (sanitarium master)
db1083 - s1 master
db1084 - m2 slave (was soon going to become master)
db1085 - s6 slave (sanitarium master)
db1086 - s7 master
db1087 - s8 slave (sanitarium master)
db1089 - s1 slave
db1090 - s2 slave
db1091 - s1 slave
db1092 - s8 slave
db1093 - s6 slave
db1094 - s7 slave
db1095 - backup source

Related Objects

Event Timeline

Marostegui moved this task from Triage to Meta/Epic on the DBA board.

All these hosts are going away once T258361 is completed so I am going to close this, as we are not really going to do anything to this list other than decommissioning them.