Preliminary info:
- As part of regular database maintenance, db1107 (eventlogging master database) needed to be rebooted to allow kernel and mariadb updates.
- We run a proxy in front of the eventlogging database, called m4-master, that is responsible to send traffic to db1107. If the host goes down, it fails over to db1108 (analytics-slave).
Maintenance steps performed:
- Stopped replication on db1108 (eventlogging_sync) and disabled puppet.
- Stopped Eventlogging Mysql Consumers on eventlog1001, so mysql traffic to db1107 stopped.
- Performed maintenance on db1107.
- re-enabled replication on db1108 and also mysql consumers on eventog1001.
The main issue is that before 4) we didn't re-set db1107 as m4-master, so db1108 remained the target. Once mysql traffic restarted, then new events were inserted on db1108.
So now we are in this state:
- eventlogging mysql traffic stopped.
- after 1), some new rows have been inserted to db1107 that have not been replicated to db1108
- after 4), 2h of eventlogging data have been inserted to db1108.
Ideally we could drop this two hours of new data, move the consumer groups' offsets for an earlier stage and then restart everything to replay the data, but not sure how feasible it is.