db2097 had a memory issue (T225378) that made s6 to crash (s1 didn't).
Even though a data checksum on s6 revealed no drifts, we should probably rebuild it
Description
Related Objects
Event Timeline
db2097 is a backup source host, so probably best not to experiment with 10.4 there until we are sure it works fine. That 10.4 test will be a goal for the next Q, so for now we can just rebuilt s1 and s6 as it is, for now or decline this task.
I will defer that decision to @jcrespo, I am fine either way (just rebuild s1 and s6 or just close this task). As mentioned, the compare.py revealed no drifts.
I agree we should keep the backup sources either the same version as the master or as >50% of the replicas (aka "upgrade it at the same time as the master"). However, if we decide to change a whole section (e.g. s1 and s6 on codfw only) to a higher version, we could also upgrade the backup source of that dc to test the backup workflow on the new version.
This was rebuilt on second crash: T252492 (that is why it took me so much time to send it to dc ops). CC @Marostegui