Page MenuHomePhabricator

Migrate deployment-prep away from Debian Jessie to Debian Stretch/Buster
Open, Needs TriagePublic

Description

Ubuntu Trusty is gone (at least, from our project) and Debian Jessie instance creation just got disabled (see T218119: Disable jessie VM creation in VPS).
Therefore the following instances are not reproducible in their current state. If they get lost to a hardware failure and are not able to be set up on stretch, the service they ran may be SOL.
So it's time to begin migrating our 34 Jessie instances towards Stretch.
You'll notice a Buster prerelease image is available to deployment-prep alongside Stretch. Please don't use this unless production is running the same service on buster, or you are setting up a fresh service that will be on buster when deployed to production, or you are working on migrating the production service to buster. Buster is now released and available publicly, go nuts.
The following deployment-prep instances are running Jessie:

NameStatus/task
deployment-sessionstore01.deployment-prep.eqiad.wmflabsNew, T218609: Figure out future for newly created deployment-prep jessie instances, deployment-sessionstore02 created to replace but needs testing
deployment-cpjobqueue.deployment-prep.eqiad.wmflabsAppears at T198901, depending on timing it may disappear before jessie becomes EOL
deployment-memc06.deployment-prep.eqiad.wmflabsBuster plans for memc in T213089: Upgrade memcached for Debian Stretch/Buster, should we go to stretch or buster? A mix of different hosts on different distros?
deployment-memc07.deployment-prep.eqiad.wmflabsBuster plans for memc in T213089: Upgrade memcached for Debian Stretch/Buster, should we go to stretch or buster? A mix of different hosts on different distros?
deployment-etcd-01.deployment-prep.eqiad.wmflabsprod etcd* hosts are jessie - T224549: Track remaining jessie systems in production - might be T224574: Migrate Kubernetes etcd clusters to Stretch/Buster though our one is probably not for k8s
deployment-zookeeper02.deployment-prep.eqiad.wmflabsSee subtask
deployment-mcs01.deployment-prep.eqiad.wmflabsAppears at T198901, depending on timing it may disappear before jessie becomes EOL
deployment-pdfrender02.deployment-prep.eqiad.wmflabsAppears at T198901, depending on timing it may disappear before jessie becomes EOL
deployment-sca04.deployment-prep.eqiad.wmflabsAppears to just run recommendation-api which appears at T198901, depending on timing it may disappear before jessie becomes EOL
deployment-fluorine02.deployment-prep.eqiad.wmflabsprod mwlog* hosts are jessie - T224565: Migrate mwlog/udp2log servers to Buster
deployment-parsoid09.deployment-prep.eqiad.wmflabsAppears at T198901, depending on timing it may disappear before jessie becomes EOL
deployment-changeprop.deployment-prep.eqiad.wmflabsAppears at T198901, depending on timing it may disappear before jessie becomes EOL
deployment-imagescaler01.deployment-prep.eqiad.wmflabs02 and 03 are stretch, plus T216815: Upgrade Thumbor to Buster - is 01 needed? if not can it go, otherwise is it a candidate to become a buster testing instance?
deployment-ircd.deployment-prep.eqiad.wmflabsprod host kraz is jessie - T224579: Migrate irc.wikimedia.org/kraz to Stretch/Buster
deployment-memc04.deployment-prep.eqiad.wmflabsBuster plans for memc in T213089: Upgrade memcached for Debian Stretch/Buster, should we go to stretch or buster? A mix of different hosts on different distros?
deployment-memc05.deployment-prep.eqiad.wmflabsBuster plans for memc in T213089: Upgrade memcached for Debian Stretch/Buster, should we go to stretch or buster? A mix of different hosts on different distros?
deployment-restbase01.deployment-prep.eqiad.wmflabsAppears at T198901, depending on timing it may disappear before jessie becomes EOL
deployment-sca01.deployment-prep.eqiad.wmflabsRuns eventstreams, graphoid, and recommendation-api, which all appear at T198901, depending on timing they may disappear before jessie becomes EOL. Also runs apertium which is not listed there.
deployment-sca02.deployment-prep.eqiad.wmflabsRuns eventstreams, graphoid, and recommendation-api, which all appear at T198901, depending on timing they may disappear before jessie becomes EOL. Also runs apertium which is not listed there.
deployment-sentry01.deployment-prep.eqiad.wmflabsspoke to tgr, SRE might be making Sentry inside k8s later this year, in the mean time this is currently unused but people may want to use it to test frontend logic - probably no point migrating it to stretch though
deployment-logstash2.deployment-prep.eqiad.wmflabsdeployment-logstash03 created to replace, WIP - see comments below
deployment-restbase02.deployment-prep.eqiad.wmflabsAppears at T198901, depending on timing it may disappear before jessie becomes EOL

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 19 2019, 7:42 PM
Krenair updated the task description. (Show Details)Mar 19 2019, 7:45 PM
Krenair updated the task description. (Show Details)
Krenair updated the task description. (Show Details)
Krenair updated the task description. (Show Details)Mar 19 2019, 8:09 PM
Krenair added a subscriber: Mholloway.
Krenair updated the task description. (Show Details)Mar 19 2019, 10:35 PM
Krenair updated the task description. (Show Details)Mar 19 2019, 10:44 PM
Krenair updated the task description. (Show Details)Mar 24 2019, 3:09 AM
Krenair updated the task description. (Show Details)Mar 25 2019, 3:10 PM
Krenair updated the task description. (Show Details)Apr 9 2019, 8:16 PM
Krenair updated the task description. (Show Details)Apr 11 2019, 4:34 PM
Krenair updated the task description. (Show Details)Apr 12 2019, 5:11 AM
Krenair updated the task description. (Show Details)Apr 12 2019, 3:39 PM
Krenair updated the task description. (Show Details)Apr 13 2019, 12:08 AM
Krenair updated the task description. (Show Details)Apr 13 2019, 11:53 PM
Krenair updated the task description. (Show Details)Apr 19 2019, 3:13 AM
Krenair updated the task description. (Show Details)Apr 23 2019, 9:31 AM

Mentioned in SAL (#wikimedia-releng) [2019-04-23T14:41:29Z] <Krenair> Shut down deployment-ms-be03 and deployment-ms-be04 T218729

Krenair updated the task description. (Show Details)Apr 23 2019, 2:42 PM
Krenair updated the task description. (Show Details)Apr 23 2019, 2:45 PM
Krenair updated the task description. (Show Details)Apr 25 2019, 3:53 AM

Mentioned in SAL (#wikimedia-releng) [2019-04-25T16:35:35Z] <Krenair> shutting down deployment-ms-fe02 and deployment-poolcounter04 T218729

Krenair updated the task description. (Show Details)Apr 25 2019, 4:37 PM

Mentioned in SAL (#wikimedia-releng) [2019-04-26T13:33:37Z] <Krenair> shut off deployment-conf03 after discussion with otto.mata and elu.key - it seems ancient, broken, unused. T218729

Krenair updated the task description. (Show Details)Apr 26 2019, 1:34 PM
Krenair updated the task description. (Show Details)Apr 27 2019, 4:41 PM
Krenair updated the task description. (Show Details)Apr 27 2019, 5:58 PM
Krenair updated the task description. (Show Details)Apr 27 2019, 6:13 PM

re: logstash, prod hosts are stretch so starting up a stretch instance with the same roles/hiera is expected to work. There will be a couple of migrations involved, namely moving kafka and elasticsearch off to the stretch instance, cc @herron @colewhite

On Thursday when I get rid of deployment-ms-fe02 and deployment-poolcounter04 we should have enough room in the quota again to create an xlarge, at which point I can make a new logstash instance with the same roles etc.
I don't know about doing the migrations though, I'm not familiar with kafka or elasticsearch.

Krenair updated the task description. (Show Details)May 2 2019, 4:58 PM
Krenair added a comment.EditedMay 2 2019, 5:26 PM

re: logstash, prod hosts are stretch so starting up a stretch instance with the same roles/hiera is expected to work. There will be a couple of migrations involved, namely moving kafka and elasticsearch off to the stretch instance, cc @herron @colewhite

I've set up the instance as deployment-logstash03 - looks like the elastalert package is missing though. On the existing instance it appears to just be installed locally:

krenair@deployment-logstash2:~$ apt-cache policy elastalert
elastalert:
  Installed: 0.1.39-1~bpo9+1
  Candidate: 0.1.39-1~bpo9+1
  Version table:
 *** 0.1.39-1~bpo9+1 0
        100 /var/lib/dpkg/status
Krenair updated the task description. (Show Details)May 2 2019, 5:26 PM

re: logstash, prod hosts are stretch so starting up a stretch instance with the same roles/hiera is expected to work. There will be a couple of migrations involved, namely moving kafka and elasticsearch off to the stretch instance, cc @herron @colewhite

I've set up the instance as deployment-logstash03 - looks like the elastalert package is missing though. On the existing instance it appears to just be installed locally:

Indeed, that's my bad! I've fixed https://gerrit.wikimedia.org/r/c/operations/puppet/+/502773 to include component/elastalert which is where the package (for stretch) lives, however ATM I can't rebase cleanly on deployment-puppetmaster03 due to conflicts in unrelated patches. Anyways once 502773 and 505762 are cherry-picked again on puppetmaster then puppet should work as intended.

re: logstash, prod hosts are stretch so starting up a stretch instance with the same roles/hiera is expected to work. There will be a couple of migrations involved, namely moving kafka and elasticsearch off to the stretch instance, cc @herron @colewhite

I've set up the instance as deployment-logstash03 - looks like the elastalert package is missing though. On the existing instance it appears to just be installed locally:

Indeed, that's my bad! I've fixed https://gerrit.wikimedia.org/r/c/operations/puppet/+/502773 to include component/elastalert which is where the package (for stretch) lives, however ATM I can't rebase cleanly on deployment-puppetmaster03 due to conflicts in unrelated patches. Anyways once 502773 and 505762 are cherry-picked again on puppetmaster then puppet should work as intended.

No worries I'm happy to take care of that sort of problem, I've got the puppet repo sorted out (though it's a bit of a mess and the commit involved doesn't make quite as much sense as it used to now, should return to sort that out later), I've updated those cherry-picks and encountered a couple of issues. The first is a simple missing group which I've left a comment for on the gerrit changeset and amended our cherry-pick to fix, the other is around elastalert.util.EAException: Invalid Rule file: /etc/elastalert/security/rules/base.inc20190503-1970-1rnja7p

Krenair updated the task description. (Show Details)May 3 2019, 3:57 PM
Krenair updated the task description. (Show Details)May 3 2019, 4:09 PM
Krenair updated the task description. (Show Details)May 3 2019, 4:57 PM

re: logstash, prod hosts are stretch so starting up a stretch instance with the same roles/hiera is expected to work. There will be a couple of migrations involved, namely moving kafka and elasticsearch off to the stretch instance, cc @herron @colewhite

I've set up the instance as deployment-logstash03 - looks like the elastalert package is missing though. On the existing instance it appears to just be installed locally:

Indeed, that's my bad! I've fixed https://gerrit.wikimedia.org/r/c/operations/puppet/+/502773 to include component/elastalert which is where the package (for stretch) lives, however ATM I can't rebase cleanly on deployment-puppetmaster03 due to conflicts in unrelated patches. Anyways once 502773 and 505762 are cherry-picked again on puppetmaster then puppet should work as intended.

No worries I'm happy to take care of that sort of problem, I've got the puppet repo sorted out (though it's a bit of a mess and the commit involved doesn't make quite as much sense as it used to now, should return to sort that out later), I've updated those cherry-picks and encountered a couple of issues. The first is a simple missing group which I've left a comment for on the gerrit changeset and amended our cherry-pick to fix, the other is around elastalert.util.EAException: Invalid Rule file: /etc/elastalert/security/rules/base.inc20190503-1970-1rnja7p

Thanks! Yes the validation should be disabled for base.inc, I did so in the last PS of https://gerrit.wikimedia.org/r/c/operations/puppet/+/505762 but won't have time today to rebase/cherry-pick/test, although let me know if you still run into problems!

No worries I'm happy to take care of that sort of problem, I've got the puppet repo sorted out (though it's a bit of a mess and the commit involved doesn't make quite as much sense as it used to now, should return to sort that out later), I've updated those cherry-picks and encountered a couple of issues. The first is a simple missing group which I've left a comment for on the gerrit changeset and amended our cherry-pick to fix, the other is around elastalert.util.EAException: Invalid Rule file: /etc/elastalert/security/rules/base.inc20190503-1970-1rnja7p

Thanks! Yes the validation should be disabled for base.inc, I did so in the last PS of https://gerrit.wikimedia.org/r/c/operations/puppet/+/505762 but won't have time today to rebase/cherry-pick/test, although let me know if you still run into problems!

No rebase problems today, just had to fix Bool -> Boolean and validate_cmd undef instead of '' in the commit and re-cherry-pick it. Puppet is happy on the new logstash03 instance now.

Krenair updated the task description. (Show Details)May 7 2019, 3:39 PM

No worries I'm happy to take care of that sort of problem, I've got the puppet repo sorted out (though it's a bit of a mess and the commit involved doesn't make quite as much sense as it used to now, should return to sort that out later), I've updated those cherry-picks and encountered a couple of issues. The first is a simple missing group which I've left a comment for on the gerrit changeset and amended our cherry-pick to fix, the other is around elastalert.util.EAException: Invalid Rule file: /etc/elastalert/security/rules/base.inc20190503-1970-1rnja7p

Thanks! Yes the validation should be disabled for base.inc, I did so in the last PS of https://gerrit.wikimedia.org/r/c/operations/puppet/+/505762 but won't have time today to rebase/cherry-pick/test, although let me know if you still run into problems!

No rebase problems today, just had to fix Bool -> Boolean and validate_cmd undef instead of '' in the commit and re-cherry-pick it. Puppet is happy on the new logstash03 instance now.

Thank you! re: migration of kafka and elasticsearch me, and/or @colewhite and/or @herron would be able to assist

Krenair updated the task description. (Show Details)May 11 2019, 6:19 PM
Krenair updated the task description. (Show Details)May 14 2019, 11:56 AM
Krenair updated the task description. (Show Details)May 15 2019, 7:08 PM
Krenair updated the task description. (Show Details)May 15 2019, 7:13 PM
Krenair updated the task description. (Show Details)May 15 2019, 7:20 PM
Krenair updated the task description. (Show Details)May 29 2019, 10:33 AM
Krenair updated the task description. (Show Details)May 29 2019, 12:32 PM
Krenair updated the task description. (Show Details)May 29 2019, 1:54 PM
Krenair updated the task description. (Show Details)May 29 2019, 4:26 PM
Krenair renamed this task from Migrate away from Debian Jessie to Debian Stretch to Migrate away from Debian Jessie to Debian Stretch/Buster.Jul 9 2019, 10:57 PM
Krenair updated the task description. (Show Details)
Krenair renamed this task from Migrate away from Debian Jessie to Debian Stretch/Buster to Migrate deployment-prep away from Debian Jessie to Debian Stretch/Buster.Sep 12 2019, 11:28 PM
hashar removed a subscriber: hashar.Oct 28 2019, 10:38 AM

@fgiunchedi Do we need to do anything else to get rid of deployment-logstash2 and use deployment-logstash03 instead? logstash2 now has a puppet error due to https://gerrit.wikimedia.org/r/c/operations/puppet/+/522406 (T198092)

@fgiunchedi Do we need to do anything else to get rid of deployment-logstash2 and use deployment-logstash03 instead? logstash2 now has a puppet error due to https://gerrit.wikimedia.org/r/c/operations/puppet/+/522406 (T198092)

If deployment-logstash03 has the same classes applied than deployment-logstash2 and no puppet errors I'd say the next step would be to switch producers to use deployment-logstash03 and the proxy to logstash-beta.wmflabs.org. It might help with T233134: logstash-beta.wmflabs.org does not receive any mediawiki events too