Page MenuHomePhabricator

Create and announce timeline for shutting down labsdb100[13]
Closed, ResolvedPublic

Description

Draft a timeline for the final shutdown of the ancient labsdb servers and announce it on labs-announce and the blog. There should be a page on wikitech created similar to the one that we did for the Precise deprecation project that the announcements can link to.

Event Timeline

bd808 created this task.Sep 5 2017, 10:25 PM
bd808 moved this task from Backlog to Wiki replicas on the Data-Services board.Sep 5 2017, 11:04 PM

@jcrespo has suggested Wednesday 2017-12-13 as the target shutdown date for the servers. We also need to choose a date sometime in October to perform the outstanding kernel reboots on these servers.

bd808 claimed this task.Sep 25 2017, 11:47 PM
Restricted Application added a project: User-bd808. · View Herald TranscriptSep 25 2017, 11:47 PM

We also need to choose a date sometime in October to perform the outstanding kernel reboots on these servers.

My question to that would be...if one of them doesn't come back, are we ready to handle that potential issue?
If we are...what would stop us from decommissioning earlier? :-)

bd808 added a comment.Sep 26 2017, 6:01 AM

My question to that would be...if one of them doesn't come back, are we ready to handle that potential issue?
If we are...what would stop us from decommissioning earlier? :-)

On wikitech we say that the user created databases could disappear at any time, but we try not to cause that unnecessarily. The blocker to decomm is giving reasonable warning of the decision made in T156869: Design a method for keeping user-created tables in sync across labsDBs for tools that are relying on co-located databases. I was initially in favor of never rebooting labsdb100[13], but in T168584#3569772 @jcrespo proposed a reasonable compromise of waiting until the basic announce of the new servers had been made and people had started to migrate. I do think that we should have puppet changes staged in gerrit to fail the *.labsdb hostnames over to the new servers before we try any reboot.

My question to that would be...if one of them doesn't come back, are we ready to handle that potential issue?
If we are...what would stop us from decommissioning earlier? :-)

On wikitech we say that the user created databases could disappear at any time, but we try not to cause that unnecessarily. The blocker to decomm is giving reasonable warning of the decision made in T156869: Design a method for keeping user-created tables in sync across labsDBs for tools that are relying on co-located databases. I was initially in favor of never rebooting labsdb100[13], but in T168584#3569772 @jcrespo proposed a reasonable compromise of waiting until the basic announce of the new servers had been made and people had started to migrate. I do think that we should have puppet changes staged in gerrit to fail the *.labsdb hostnames over to the new servers before we try any reboot.

Sounds good to me! Thanks for clearing that up :-)

bd808 added a comment.Oct 16 2017, 8:22 PM

Drafted on wikitech -- https://wikitech.wikimedia.org/wiki/Wiki_Replica_c1_and_c3_shutdown

We need to pick dates/times for the reboots that work for roots from the cloud-services-team and for the DBA team so that we can handle routine issues that may come up from the reboots. @jcrespo, I'll have my people call your people. ;)