We lost an Elastic Psi master to hardware failure in T311939 . The only notice we received was an email on Sunday at UTC 0853 . During a subsequent reimage operation, we lost another master and the entire psi cluster in CODFW went offline for a few minutes. (Note that there was no discernable user impact).
Creating these this ticket so Search team can decide on the proper urgency for failed masters and add the appropriate amount/type of alerting.