Page MenuHomePhabricator

Batch db1074-db1079 hosts having BBU issues
Closed, DeclinedPublic

Description

The following hosts from the same batch have had BBU issues:

db1074 T231638 s2 slave (sanitarium master)
db1075 T233534 s3 master
db1077 T225391 was s3 host, now test host

The following hosts are also part of the batch and are working fine at the moment but I would assume they can suffer the same thing sooner or later, so we should make sure they are not masters

db1076 s2 slave
db1078 s3 slave
db1079 s7 slave (sanitarium master)

I am not fully sure if we can trust these hosts for a long time (they are also out of warranty since May 2019), maybe we should buy 6 new hosts to replace these?
@mark

These hosts were initially scheduled to be refreshed in March 2021

Event Timeline

Marostegui triaged this task as Medium priority.Sep 23 2019, 5:16 AM
Marostegui moved this task from Triage to Meta/Epic on the DBA board.

Change 538522 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] mariadb: Promote db1123 to s3 master

https://gerrit.wikimedia.org/r/538522

Change 538522 merged by Marostegui:
[operations/puppet@production] mariadb: Promote db1123 to s3 master

https://gerrit.wikimedia.org/r/538522

Change 539113 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] db1075: Change binlog format to STATEMENT

https://gerrit.wikimedia.org/r/539113

Change 539113 merged by Marostegui:
[operations/puppet@production] db1075: Change binlog format to STATEMENT

https://gerrit.wikimedia.org/r/539113

Change 539268 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] db1078: Change binlog format to ROW

https://gerrit.wikimedia.org/r/539268

Change 539268 merged by Marostegui:
[operations/puppet@production] db1078: Change binlog format to ROW

https://gerrit.wikimedia.org/r/539268

Declining as these hosts will be refreshed next FY