Page MenuHomePhabricator

Migrate labsdb1005/1006/1007 to jessie
Closed, ResolvedPublic

Description

Still on precise, migrate to jessie. This would also involve a postgresql update from 9.1 to 9.4

Event Timeline

MoritzMuehlenhoff raised the priority of this task from to Needs Triage.
MoritzMuehlenhoff updated the task description. (Show Details)
MoritzMuehlenhoff added a project: Operations.
Restricted Application added subscribers: StudiesWorld, Aklapper. · View Herald TranscriptJan 15 2016, 12:48 PM
Dzahn renamed this task from Migrate labsdb1006/1007 to jessie to Migrate labsdb1005/1006/1007 to jessie.Jan 28 2016, 12:49 AM
Dzahn set Security to None.
Dzahn added a subscriber: Dzahn.

1005 as well:

labsdb1005.eqiad.wmnet: True
labsdb1006.eqiad.wmnet: True
labsdb1007.eqiad.wmnet: True

Restricted Application added a project: Cloud-Services. · View Herald TranscriptApr 11 2016, 5:56 PM
Dzahn added a project: DBA.Apr 11 2016, 5:57 PM
chasemp triaged this task as Normal priority.Jun 21 2016, 1:49 PM
chasemp added a subscriber: chasemp.

one thought is we have an influx of new labsdb things coming I believe. This way sort itself out w/o a lot of in-place shuffling.

There is indeed a replacement for labsdb100[123] about to arrive. However, there are no short-term plans for these, as they have lower impact.

labsdb1005.eqiad.wmnet has already a jessie slave- so it should be able to have scheduled downtime soon.
Not much is happening for the postgres slaves.

fgiunchedi changed the task status from Open to Stalled.Sep 30 2016, 2:40 PM
fgiunchedi added subscribers: akosiaris, fgiunchedi.

Setting as stalled, though next steps look like this:

  • Flip tools master from labsdb1005 to labsdb1004
  • Decommission labsdb1005

Not sure about postgresql/osm steps and if the osm roles are jessie-ready yet, @akosiaris perhaps?

Change 318520 had a related patch set uploaded (by Jcrespo):
labsdb-toolsdb: Cleaning up tls certificates

https://gerrit.wikimedia.org/r/318520

Change 318520 merged by Jcrespo:
labsdb-toolsdb: Cleaning up tls certificates

https://gerrit.wikimedia.org/r/318520

We need to schedule a downtime to do this move from labsdb1005 to labsdb1004. This should be a very short window of actual outage.

Need to sort out impact of this maint.

If we settle on a date and announce on labs-announce...

If we settle on a date and announce on labs-announce...

@yuvipanda I think the asks here if you could think on it are:

  1. Who do we let know toolsdb is going down
  2. What are the impacts of toolsdb going down (just generally)

-1- assume labs-announce is fine

-2- I'm not sure? Are things using toolsdb going to be ok? Assuming things reconnect on failure (probably a bad assumption) it's a small window of issue but looking for input. There may be little we can do to shore that up other than verbose announcement.

I wonder if it'll be better to do this next quarter. We've already done a few bits of pretty disruptive maintenance, and have one coming up next week.

If not next quarter, how about 2nd week of December?

I wonder if it'll be better to do this next quarter.

I am ok with next quarter- let's set a time. I have workarounded the 5.5 support on puppet, so this is no longer a blocker.

Just one comment- note that in theory this maintenance is not disruptive.

Ok. Early January?

January ok, but after the 15th.

jcrespo moved this task from Triage to Meta/Epic on the DBA board.Nov 10 2016, 12:10 PM
faidon added a subscriber: faidon.Jan 4 2017, 5:53 PM

Early January is here and the 15th is coming up fast ­­-- @yuvipanda rightfully mentioned above that this will need a (presumably advance) notice to labs-announce, so… friendly ping :)

@jcrespo How does Jan 25 / 26 work for you?

@jcrespo How does Jan 25 / 26 work for you?

+1. let's meet at some point to organize the details of how to do it (there is several possibilities) and send an announcement.

Ping! Jan 25 is a week away from now, not a lot of time left for an announcement :)

mark raised the priority of this task from Normal to High.Jan 18 2017, 1:37 PM

I didn't manage to send out the announcement due to unforseen personal issues. I'll send it out now after checking with jynus.

Update: Since I'll be travelling on the 25th, I'm going to push this out to early February instead. I'll ping @jcrespo when he's back from vacation next week to put a solid date on it and make an announcement.

+1, let's meet before to clarify impact.

Ping! Early February is now a week away.

jcrespo changed the task status from Stalled to Open.Feb 6 2017, 5:32 PM

We announced a while ago we're gonna do this on the 15th.

Let's use T157358 for this. Postgres is a different beast.

Change 337775 had a related patch set uploaded (by Yuvipanda):
tools: Make DNS point to labsdb1004 and not 1005

https://gerrit.wikimedia.org/r/337775

Change 337775 merged by Yuvipanda:
tools: Make DNS point to labsdb1004 and not 1005

https://gerrit.wikimedia.org/r/337775

Dzahn added a comment.Mar 20 2017, 3:52 PM

I see that labs1005/1006/1007 are all either re-installed or down. They don't show up as precise anymore when checking with salt.

Is this resolved (besides a decom subtask maybe?)?

Dzhan- the "reinstall as jessie" part is done, but the setup of the passive replica is not 100% complete. It will take one commit to fix it and some extra time for reimport- but there is not way to revert it anymore. I just got distracted with more important ongoing issues.

Dzahn added a comment.Mar 20 2017, 3:57 PM

Got it, and thank you very much.

Change 343670 had a related patch set uploaded (by Jcrespo):
[operations/puppet] Change osm master to be labsdb1007 on configuration

https://gerrit.wikimedia.org/r/343670

Change 343670 merged by Jcrespo:
[operations/puppet] Change osm master to be labsdb1007 on configuration

https://gerrit.wikimedia.org/r/343670

jcrespo closed this task as Resolved.Mar 31 2017, 3:26 PM
jcrespo claimed this task.

This is done. Some small follups (not related to jessie) at: T157359

Dzahn awarded a token.Mar 31 2017, 3:31 PM