Page MenuHomePhabricator

Migrate labstore1006/1007 to Stretch/Buster
Closed, ResolvedPublic

Description

These are currently running jessie:

  • labstore1006.wikimedia.org
  • labstore1007.wikimedia.org

Event Timeline

ArielGlenn triaged this task as Medium priority.Jun 11 2019, 7:53 AM
ArielGlenn subscribed.

Change 579095 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] dumps-distribution: move all traffic to labstore1006

https://gerrit.wikimedia.org/r/579095

@elukey Are there any special considerations for Kerberos/HDFS/Hadoop stuff if I aim to upgrade this pair in place to Stretch and then possibly Buster?

@Bstorm the only thing to keep in mind is that Buster ships with java 11 only, and hadoop for the moment works only with 8, so we'd need a special config like profile::java::analytics in case :)

Edit: nevermind, already deployed on the labstores, all good then!

profile::java::analytics uses the Java 8 forward port on Buster, so that part should be fine. Wrt Kerberos there should also be no real issues I can think of.

In that case it might be worth saving some future work and moving along to buster on these.

@elukey: I don't even think we need additional changes? E.g. an-launcher1001 is on Buster and uses profile::hadoop::common with OpenJDK 8, so this is already all implemented.

@elukey: I don't even think we need additional changes? E.g. an-launcher1001 is on Buster and uses profile::hadoop::common with OpenJDK 8, so this is already all implemented.

Yes I edited my comment above, everything is already implemented for labstores, no real precautions needed for both Stretch or Buster.

I'm going through the process on virtual machines before I proceed on labstore1007 so at least I'm not surprised by some of the ways it won't go well.

Change 579095 merged by Bstorm:
[operations/puppet@production] dumps-distribution: move all traffic to labstore1006

https://gerrit.wikimedia.org/r/579095

Mentioned in SAL (#wikimedia-operations) [2020-03-12T22:07:54Z] <bstorm_> moving all nfs traffic off labstore1007 and to labstore1006 for upgrades T224583

Mentioned in SAL (#wikimedia-operations) [2020-03-13T16:04:35Z] <bstorm_> rebooting labstore1007 for first cycle of upgrades T224583

Mentioned in SAL (#wikimedia-operations) [2020-03-13T16:51:32Z] <bstorm_> rebooting labstore1007 for stretch upgrade T224583

labstore1007 is serving NFS and web fine on stretch now.

Mind you, clients are still all pointed at 1006. Now to try buster.

Mentioned in SAL (#wikimedia-operations) [2020-03-13T19:30:51Z] <bstorm_> rebooting labstore1007 for upgrade to buster T224583

Change 579616 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] Revert "dumps-distribution: move all traffic to labstore1006"

https://gerrit.wikimedia.org/r/579616

Change 579616 merged by Bstorm:
[operations/puppet@production] Revert "dumps-distribution: move all traffic to labstore1006"

https://gerrit.wikimedia.org/r/579616

Change 579621 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] dumps-distribution: Fix smartmontools so it doesn't trigger alerts

https://gerrit.wikimedia.org/r/579621

Change 579621 merged by Bstorm:
[operations/puppet@production] dumps-distribution: Fix smartmontools so it doesn't trigger alerts

https://gerrit.wikimedia.org/r/579621

Change 579626 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] dumps-distribution: move all NFS traffic to labstore1007

https://gerrit.wikimedia.org/r/579626

Change 579627 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] dumps-distribution: switch which host does acme

https://gerrit.wikimedia.org/r/579627

Change 579628 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/dns@master] dumps-distribution: set the TTL to 5M for dumps.wikimedia.org

https://gerrit.wikimedia.org/r/579628

Change 579628 merged by Bstorm:
[operations/dns@master] dumps-distribution: set the TTL to 5M for dumps.wikimedia.org

https://gerrit.wikimedia.org/r/579628

Change 579642 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/dns@master] dumps-distribution: fail over to labstore1007 for dumps.wikimedia.org

https://gerrit.wikimedia.org/r/579642

Change 579626 merged by Bstorm:
[operations/puppet@production] dumps-distribution: move all NFS traffic to labstore1007

https://gerrit.wikimedia.org/r/579626

Change 579642 merged by Bstorm:
[operations/dns@master] dumps-distribution: fail over to labstore1007 for dumps.wikimedia.org

https://gerrit.wikimedia.org/r/579642

Change 579627 merged by Bstorm:
[operations/puppet@production] dumps-distribution: switch which host does acme

https://gerrit.wikimedia.org/r/579627

Mentioned in SAL (#wikimedia-operations) [2020-03-13T22:21:14Z] <bstorm_> downtimed labstore1006 for upgrades T224583

Noting here as well: when doing these in-place upgrades, facter 3 causes a funny thing to happen where it caches the last OS version and screws with your puppet runs.

You have to run sudo rm /opt/puppetlabs/facter/cache/cached_facts/operating\ system to clear it out of the cache if that happens.

Mentioned in SAL (#wikimedia-operations) [2020-03-13T23:12:44Z] <bstorm_> rebooting labstore1006 for upgrade to stretch T224583

Ok, labstore1006 is now buster. Failing things back to their steady state.

Change 579670 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] Revert "dumps-distribution: move all NFS traffic to labstore1007"

https://gerrit.wikimedia.org/r/579670

Change 579671 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/dns@master] Revert "dumps-distribution: fail over to labstore1007 for dumps.wikimedia.org"

https://gerrit.wikimedia.org/r/579671

Change 579672 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] Revert "dumps-distribution: switch which host does acme"

https://gerrit.wikimedia.org/r/579672

Change 579670 merged by Bstorm:
[operations/puppet@production] Revert "dumps-distribution: move all NFS traffic to labstore1007"

https://gerrit.wikimedia.org/r/579670

Change 579671 merged by Bstorm:
[operations/dns@master] Revert "dumps-distribution: fail over to labstore1007 for dumps.wikimedia.org"

https://gerrit.wikimedia.org/r/579671

Change 579672 merged by Bstorm:
[operations/puppet@production] Revert "dumps-distribution: switch which host does acme"

https://gerrit.wikimedia.org/r/579672

Change 579674 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/dns@master] Revert "dumps-distribution: set the TTL to 5M for dumps.wikimedia.org"

https://gerrit.wikimedia.org/r/579674

Change 579674 merged by Bstorm:
[operations/dns@master] Revert "dumps-distribution: set the TTL to 5M for dumps.wikimedia.org"

https://gerrit.wikimedia.org/r/579674

Everything is failed back to how it normally is except it's now buster.