Page MenuHomePhabricator

2 hour outage to update mysql on EL slaves {oryx} [3 pts]
Closed, ResolvedPublic

Description

2 hour outage to update mysql on El slaves

Eventlogging mysql consumer needs to be stopped, we might want to debate whether we want to stop the whole system though. cc @Ottomata

What do you think?

Event Timeline

Nuria renamed this task from 2 hour outage to update mysql on El slaves to 2 hour outage to update mysql on EL slaves.
Nuria assigned this task to mforns.
Nuria raised the priority of this task from to High.
Nuria updated the task description. (Show Details)
Nuria updated the task description. (Show Details)
Nuria set Security to None.
Nuria added subscribers: gerritbot, Ottomata, Nuria and 3 others.

It is very important to send a notice to all users with enough time in advance. As the affected services are not yet clear, please remember it should be done as soon as it is decided.

BTW, this will only affect the MASTER (the data saving) not the slaves, that will continue to be available for querying, but not updated.

Nuria raised the priority of this task from High to Unbreak Now!.Dec 10 2015, 6:08 PM

@jcrespo: @mforns will coordinate with you as he is on the EU TZ

Affected services is EL but only the DB consumer (data will flow into cluster and log files w/o issues) so I think impact is clear, let me know if I am missing anything.

To deal with the mysql upgrade window, we analytics are planning to stop the (4) EL mysql consumers during that time span.
Kafka will buffer the events that come in the meantime, which will be automatically backfilled when the mysql consumers are restarted.

Before we do that, it's necessary to fix this bug: T120209

@jcrespo
Hi Jaime :]
When is the best time for you to do the mysql upgrade on Tuesday?

@jcrespo
The bug mentioned above has been fixed in production, we can proceed with the upgrade.

You should mark dependencies on this ticket, and/or mark them as resolved.

I proposed (T120187#1869800) 2 hours starting on 2015-12-15 at 10:00 UTC.

@jcrespo
I added the task as 'blocked by', and will close it as resolved in short.
Also, I'm ok with 2 hours starting on 2015-12-15 at 10:00 UTC. I will be logged in IRC and in Gmail at that time to help you in whatever I can.
Thanks!

Maintenance seems to have been done correctly, I will keep this ticket open for a few hours to check some minor issues, mostly to check that things are going correctly. But for now, inserts seem to be going, although slowly due to the recent restart.

mforns renamed this task from 2 hour outage to update mysql on EL slaves to 2 hour outage to update mysql on EL slaves {oryx} [3 pts].Dec 15 2015, 5:11 PM
mforns moved this task from Ready to Deploy to Done on the Analytics-Kanban board.