Migrate from deployment-logstash2 (jessie) to deployment-logstash03 (stretch)
Closed, ResolvedPublic
Actions

Assigned To

None

Authored By

	Krenair
	Nov 20 2019, 12:00 AM

Description

Already partially discussed on T218729 but I noticed the sheer number of subscribers to that task and opted to make a subtask for this individual instance.

Details

	Subject	Repo	Branch	Lines +/-
	Remove deployment-logstash2	operations/puppet	production	+1 -31
	deployment-prep: Migrate to new logstash host	operations/puppet	production	+8 -10

Customize query in gerrit

Related Objects
Search...

Status	Subtype	Assigned	Task
Invalid		None	T197804 Puppet: forbid new Python2 code
Open		None	T218426 Upgrade various Cloud VPS Python 2 scripts to Python 3
Resolved	BUG REPORT	• Bstorm	T218423 Add python 3 packages to openstack::clientpackages::common
Resolved		MoritzMuehlenhoff	T232677 Remove support for Debian Jessie in Cloud Services
Duplicate		None	T236575 "deployment-prep" Cloud VPS project jessie deprecation
Resolved		None	T218729 Migrate deployment-prep away from Debian Jessie to Debian Stretch/Buster
Resolved		None	T238707 Migrate from deployment-logstash2 (jessie) to deployment-logstash03 (stretch)
Declined		None	T241481 deployment-logstash03: UDP listener died EADDRINUSE, logstash port conflict with rsyslogd
Declined		None	T276521 deployment-logstash03 puppet errors

Event Timeline

Krenair created this task.Nov 20 2019, 12:00 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 20 2019, 12:00 AM

In T218729#5670450, @fgiunchedi wrote:

If deployment-logstash03 has the same classes applied than deployment-logstash2

✅

In T218729#5670450, @fgiunchedi wrote:

and no puppet errors

✅

In T218729#5670450, @fgiunchedi wrote:

I'd say the next step would be to switch producers to use deployment-logstash03

hmmm, looks like references to this host are a little scattered:

alex@alex-laptop:~/Development/Wikimedia/Operations-Puppet (production)$ git grep deployment-logstash2
hieradata/labs.yaml:role::logging::mediawiki::udp2log::logstash_host: 'deployment-logstash2.deployment-prep.eqiad.wmflabs'
hieradata/labs/deployment-prep/common.yaml:  - "deployment-logstash2.deployment-prep.eqiad.wmflabs:10514"
hieradata/labs/deployment-prep/common.yaml:service::configuration::logstash_host: deployment-logstash2.deployment-prep.eqiad.wmflabs
hieradata/labs/deployment-prep/common.yaml:  logstash_host: deployment-logstash2.deployment-prep.eqiad.wmflabs
hieradata/labs/deployment-prep/common.yaml:logstash_host: deployment-logstash2.deployment-prep.eqiad.wmflabs
hieradata/labs/deployment-prep/common.yaml:  - 'deployment-logstash2.deployment-prep.eqiad.wmflabs:9093'
hieradata/labs/deployment-prep/host/deployment-logstash2.yaml:      - deployment-logstash2.deployment-prep.eqiad.wmflabs
hieradata/labs/deployment-prep/host/deployment-logstash2.yaml:      - deployment-logstash2.deployment-prep.eqiad.wmflabs
hieradata/labs/deployment-prep/host/deployment-logstash2.yaml:role::kibana::serveradmin: root@deployment-logstash2.deployment-prep.eqiad.wmflabs
hieradata/labs/wikidata-query/common.yaml:profile::query_service::logstash_host: 'deployment-logstash2.deployment-prep.eqiad.wmflabs'
modules/base/manifests/remote_syslog.pp:#   (e.g. ["centrallog1001.eqiad.wmnet"] or ["deployment-logstash2.deployment-prep.eqiad.wmflabs:10514"])
modules/role/manifests/beta/puppetmaster.pp:        logstash_host => 'deployment-logstash2.deployment-prep.eqiad.wmflabs',
modules/scap/templates/scap.cfg.erb:logstash_host: deployment-logstash2.deployment-prep.eqiad.wmflabs:9200

Guess I'll upload a puppet commit to update deployment-prep/common.yaml and the non-hieradata stuff.

alex@alex-laptop:~/Development/Wikimedia/instance-puppet (master)$ git grep deployment-logstash2
deployment-prep/_.yaml:      deployment-logstash2.deployment-prep.eqiad.wmflabs:
deployment-prep/_.yaml:      deployment-logstash2.deployment-prep.eqiad.wmflabs:
deployment-prep/_.yaml:- deployment-logstash2.deployment-prep.eqiad.wmflabs:9093
deployment-prep/_.yaml:service::configuration::logstash_host: deployment-logstash2.deployment-prep.eqiad.wmflabs
deployment-prep/deployment-aqs.yaml:profile::aqs::logstash_host: deployment-logstash2.deployment-prep.eqiad.wmflabs
deployment-prep/deployment-aqs.yaml:  logstash_host: deployment-logstash2.deployment-prep.eqiad.wmflabs
deployment-prep/deployment-docker-cxserver01.deployment-prep.eqiad.wmflabs.yaml:        - host: deployment-logstash2.deployment-prep.eqiad.wmflabs
deployment-prep/deployment-eventgate-1.deployment-prep.eqiad.wmflabs.roles:      - host: deployment-logstash2.deployment-prep.eqiad.wmflabs
deployment-prep/deployment-eventgate-1.deployment-prep.eqiad.wmflabs.roles:              metadata.broker.list: deployment-logstash2.deployment-prep.eqiad.wmflabs:9092
deployment-prep/deployment-eventgate-1.deployment-prep.eqiad.wmflabs.roles:      - host: deployment-logstash2.deployment-prep.eqiad.wmflabs
deployment-prep/deployment-eventgate-1.deployment-prep.eqiad.wmflabs.roles:      - host: deployment-logstash2.deployment-prep.eqiad.wmflabs
deployment-prep/deployment-eventgate-1.deployment-prep.eqiad.wmflabs.yaml:      - host: deployment-logstash2.deployment-prep.eqiad.wmflabs
deployment-prep/deployment-eventgate-1.deployment-prep.eqiad.wmflabs.yaml:              metadata.broker.list: deployment-logstash2.deployment-prep.eqiad.wmflabs:9092
deployment-prep/deployment-eventgate-1.deployment-prep.eqiad.wmflabs.yaml:      - host: deployment-logstash2.deployment-prep.eqiad.wmflabs
deployment-prep/deployment-eventgate-1.deployment-prep.eqiad.wmflabs.yaml:      - host: deployment-logstash2.deployment-prep.eqiad.wmflabs
deployment-prep/deployment-eventgate-2.deployment-prep.eqiad.wmflabs.yaml:      - host: deployment-logstash2.deployment-prep.eqiad.wmflabs
deployment-prep/deployment-eventgate-2.deployment-prep.eqiad.wmflabs.yaml:              metadata.broker.list: deployment-logstash2.deployment-prep.eqiad.wmflabs:9092
deployment-prep/deployment-eventgate-2.deployment-prep.eqiad.wmflabs.yaml:      - host: deployment-logstash2.deployment-prep.eqiad.wmflabs
deployment-prep/deployment-eventgate-2.deployment-prep.eqiad.wmflabs.yaml:      - host: deployment-logstash2.deployment-prep.eqiad.wmflabs
deployment-prep/deployment-eventgate-3.deployment-prep.eqiad.wmflabs.yaml:      - host: deployment-logstash2.deployment-prep.eqiad.wmflabs
deployment-prep/deployment-eventgate-3.deployment-prep.eqiad.wmflabs.yaml:              metadata.broker.list: deployment-logstash2.deployment-prep.eqiad.wmflabs:9092
deployment-prep/deployment-eventgate-3.deployment-prep.eqiad.wmflabs.yaml:      - host: deployment-logstash2.deployment-prep.eqiad.wmflabs
deployment-prep/deployment-eventgate-3.deployment-prep.eqiad.wmflabs.yaml:      - host: deployment-logstash2.deployment-prep.eqiad.wmflabs
deployment-prep/deployment-logstash2.deployment-prep.eqiad.wmflabs.yaml:  - deployment-logstash2.deployment-prep.eqiad.wmflabs
deployment-prep/deployment-logstash2.deployment-prep.eqiad.wmflabs.yaml:  - deployment-logstash2.deployment-prep.eqiad.wmflabs
deployment-prep/deployment-mediawiki-.yaml:- deployment-logstash2.deployment-prep.eqiad.wmflabs:9093
deployment-prep/deployment-sessionstore.yaml:  logstash_host: deployment-logstash2.deployment-prep.eqiad.wmflabs
ores/_.yaml:logstash_host: deployment-logstash2.eqiad.wmflabs
phabricator/_.yaml:mediawiki::forward_syslog: deployment-logstash2.deployment-prep.eqiad.wmflabs:10514
striker/striker-uwsgi.yaml:    LOGSTASH_HOST: deployment-logstash2.eqiad.wmflabs
wikidata-query/_.yaml:wdqs::logstash_host: deployment-logstash2.deployment-prep.eqiad.wmflabs

(that .roles file should actually just be a list of roles, think this was a bug with the import script that set up the repo, have told Andrew)
the only deployment-eventgate host existing now is -3 so I think some of this is just a lack of old hieradata getting deleted on instance deletion (edit: T238708)

I'll update deployment-prep stuff, I'm not sure anything outside the project should be communicating with this.

In T218729#5670450, @fgiunchedi wrote:

and the proxy to logstash-beta.wmflabs.org. It might help with T233134: logstash-beta.wmflabs.org does not receive any mediawiki events too

(also apparently kibana4.wmflabs.org)

horizon-based hieradata changes:
https://gerrit.wikimedia.org/r/plugins/gitiles/cloud/instance-puppet/+/5f26dcdb608d31f477ec2f74de31f55c81fa4665%5E%21/#F0
https://gerrit.wikimedia.org/r/plugins/gitiles/cloud/instance-puppet/+/0660c88726226d038e7da0546f9e1f6192f565c5%5E%21/#F0
https://gerrit.wikimedia.org/r/plugins/gitiles/cloud/instance-puppet/+/9f14f09b6cfa2631ae11283118d12a297517bbee%5E%21/#F0
https://gerrit.wikimedia.org/r/plugins/gitiles/cloud/instance-puppet/+/f0dd923a8cfdff1ee269a11614e774b032ef338e%5E%21/#F0
https://gerrit.wikimedia.org/r/plugins/gitiles/cloud/instance-puppet/+/937bb910a49cb18418b6372ad56c4a3fc7d5b8b4%5E%21/#F0
https://gerrit.wikimedia.org/r/plugins/gitiles/cloud/instance-puppet/+/943782a36eef45a4f4f45c0a6637b4c435adb03d%5E%21/#F0

also https://gerrit.wikimedia.org/r/plugins/gitiles/cloud/instance-puppet/+/31279f370ab1b8f4cdab33eeeaf030b2ecced6a2%5E%21/#F0 to replace that hieradata/labs.yaml entry

Krenair claimed this task.Nov 20 2019, 12:30 AM

Change 551946 had a related patch set uploaded (by Alex Monk; owner: Alex Monk):
[operations/puppet@production] deployment-prep: Migrate to new logstash host

https://gerrit.wikimedia.org/r/551946

gerritbot added a project: Patch-For-Review.Nov 20 2019, 12:34 AM

Mentioned in SAL (#wikimedia-releng) [2019-11-20T00:47:34Z] <Krenair> T238707 moved kibana4/logstash-beta proxies to deployment-logstash03, copied /etc/logstash/htpasswd file

Mentioned in SAL (#wikimedia-releng) [2019-11-20T00:51:19Z] <Krenair> T238707 created old-logstash-beta proxy to point at old instance, created default Index Pattern on new logstash-beta

Mentioned in SAL (#wikimedia-releng) [2019-11-20T00:53:47Z] <Krenair> T238707 changed dateFormat (under management -> advanced settings) from 'MMMM Do YYYY, HH:mm:ss.SSS' to 'YYYY-MM-DDTHH:mm:ss', and dateFormat:tz from Browser to UTC to match old instance

Wondering what we need to do next. Do we need to copy dashboards over somehow?

Change 551946 merged by Andrew Bogott:
[operations/puppet@production] deployment-prep: Migrate to new logstash host

https://gerrit.wikimedia.org/r/551946

Maintenance_bot removed a project: Patch-For-Review.Nov 20 2019, 2:10 AM

In T238707#5677116, @Krenair wrote:

Wondering what we need to do next. Do we need to copy dashboards over somehow?

Thanks for working on this! Good point re: dashboards, they live in the .kibana index. If the new elasticsearch cluster has access to the old one the easiest option is probably to use the reindex api, e.g.

curl -X POST "localhost:9200/_reindex" -H 'Content-Type: application/json' -d"
  {
    \"source\": {
      \"remote\": {
        \"host\": \"http://${source}\"
      },
      \"index\": \".kibana\"
    },
    \"dest\": {
      \"index\": \".kibana\"
    }
  }
"

In T238707#5677730, @fgiunchedi wrote:

If the new elasticsearch cluster has access to the old one

How do I tell? And how would I fix it if not?

looks like each of the logstash hosts runs its own elasticsearch cluster locally, would our source be something like deployment-logstash2.deployment-prep.eqiad.wmflabs:9200 or deployment-logstash2.deployment-prep.eqiad.wmflabs:9300 ? it seems we'd need to configure reindex.remote.whitelist somewhere too though I have no idea where

In T238707#5680006, @Krenair wrote:

looks like each of the logstash hosts runs its own elasticsearch cluster locally, would our source be something like deployment-logstash2.deployment-prep.eqiad.wmflabs:9200 or deployment-logstash2.deployment-prep.eqiad.wmflabs:9300 ? it seems we'd need to configure reindex.remote.whitelist somewhere too though I have no idea where

You'd be launching the reindex call on logstash3 setting logstash2:9200 as the remote source, you are correct that reindex.remote.whitelist needs to be set in elasticsearch.yml! Alternatively a dump/reload scheme would also work, e.g. with https://github.com/taskrabbit/elasticsearch-dump (never tried it though)

Alright, I:

disabled puppet on deployment-logstash03
edited 03's /etc/elasticsearch/labs-logstash-eqiad/elasticsearch.yml to add reindex.remote.whitelist: deployment-logstash2.deployment-prep.eqiad.wmflabs:9200
disabled puppet on deployment-logstash2
created a /etc/ferm/conf.d/20_T238707 file on 2 containing &R_SERVICE(tcp, 9200, @resolve(deployment-logstash03.deployment-prep.eqiad.wmflabs));
live-hacked /etc/ferm/conf.d/10_mtail to not have an AAAA rule, due to T153468: Ferm's upstream Net::DNS Perl library questionable handling of NOERROR responses without records causing puppet errors when we try to @resolve AAAA in labs preventing ferm from starting
restarted ferm on 2
added security group rule to the logstash security group allowing port 9200 from other instances in the group
ran this:

root@deployment-logstash03:~# curl -X POST "localhost:9200/_reindex" -H 'Content-Type: application/json' -d"
  {
    \"source\": {
      \"remote\": {
        \"host\": \"http://deployment-logstash2.deployment-prep.eqiad.wmflabs:9200\"
      },
      \"index\": \".kibana\"
    },
    \"dest\": {
      \"index\": \".kibana\"
    }
  }
"
{"took":841,"timed_out":false,"total":151,"updated":0,"created":151,"deleted":0,"batches":1,"version_conflicts":0,"noops":0,"retries":{"bulk":0,"search":0},"throttled_millis":0,"requests_per_second":-1.0,"throttled_until_millis":0,"failures":[]}

re-enabled and ran puppet on those two hosts

and some dashboards have appeared at https://logstash-beta.wmflabs.org/app/kibana#/dashboards?_g=() just like on old-logstash-beta

Do we need to do anything else?

AFAIK if dashboards have been migrated then deployment-logstash02 should be ready to be turned off

Mentioned in SAL (#wikimedia-releng) [2019-11-26T01:02:15Z] <Krenair> Shut down deployment-logstash2 T238707

Krenair mentioned this in T218729: Migrate deployment-prep away from Debian Jessie to Debian Stretch/Buster.Dec 14 2019, 9:02 PM

Something happened with this in T233134#5713956 though I don't really understand what is required.

Krenair added a subscriber: herron.Dec 14 2019, 9:40 PM

Krenair mentioned this in T233134: logstash-beta.wmflabs.org does not receive any mediawiki events.Dec 14 2019, 9:43 PM

Reedy added a subtask: T241481: deployment-logstash03: UDP listener died EADDRINUSE, logstash port conflict with rsyslogd.Jan 17 2020, 1:47 AM

Reedy added a parent task: T243049: Decommission deployment-logstash2.Jan 17 2020, 1:52 AM

Krenair mentioned this in T243049: Decommission deployment-logstash2.Mar 8 2020, 9:51 PM

bd808 moved this task from Backlog to Subtasks on the Cloud-VPS (Debian Jessie Deprecation) board.Mar 17 2020, 6:16 AM

hashar mentioned this in T260667: scap on beta fails canary check: KeyError: 'aggregations'.Sep 3 2020, 11:23 AM

hashar merged a task: T243049: Decommission deployment-logstash2.

hashar added a subscriber: Reedy.

Peachey88 mentioned this in T257118: Beta cluster has reached its quota.Oct 7 2020, 2:02 AM

*bump* I would love to see this VM deleted since it confuses cumin (T222480)

taavi added a subtask: T276521: deployment-logstash03 puppet errors.Mar 5 2021, 8:24 AM

Jdforrester-WMF removed a parent task: T243049: Decommission deployment-logstash2.Mar 5 2021, 8:17 PM

Note: other Cloud VPS projects (wikidata-query, striker, ores, phabricator) appear to also be using deployment-logstash2. Not sure if they are actually using it but those at least have hiera keys pointing to logstash2.

In T238707#6928591, @Majavah wrote:

Note: other Cloud VPS projects (wikidata-query, striker, ores, phabricator) appear to also be using deployment-logstash2. Not sure if they are actually using it but those at least have hiera keys pointing to logstash2.

Shall we just wholesale point these to deployment-logstash03? Even if some turn out to be unused or broken, that's still better than sending them to a server which will soon need to be removed :-)

In T238707#6937533, @MoritzMuehlenhoff wrote:

In T238707#6928591, @Majavah wrote:

Note: other Cloud VPS projects (wikidata-query, striker, ores, phabricator) appear to also be using deployment-logstash2. Not sure if they are actually using it but those at least have hiera keys pointing to logstash2.

Shall we just wholesale point these to deployment-logstash03? Even if some turn out to be unused or broken, that's still better than sending them to a server which will soon need to be removed :-)

Likely yes, but I'm not a project admin on those projects and have not found time or motivation go thru all of them and contact their maintainers. Ideally that would be turned to a service record instead of pointing to individual hosts, maybe something like logstash.svc.deployment-prep.eqiad1.wikimedia.cloud (which is now possible, T276624).

Change 674392 had a related patch set uploaded (by Muehlenhoff; owner: Muehlenhoff):
[operations/puppet@production] Remove deployment-logstash2

https://gerrit.wikimedia.org/r/674392

gerritbot added a project: Patch-For-Review.Mar 23 2021, 3:51 PM

In T238707#6937543, @Majavah wrote:

Shall we just wholesale point these to deployment-logstash03? Even if some turn out to be unused or broken, that's still better than sending them to a server which will soon need to be removed :-)

Likely yes, but I'm not a project admin on those projects and have not found time or motivation go thru all of them and contact their maintainers.

I have created https://gerrit.wikimedia.org/r/674392 and will simply CC a few people who should be able to fix it to the patch.

Change 674392 merged by Muehlenhoff:
[operations/puppet@production] Remove deployment-logstash2

https://gerrit.wikimedia.org/r/674392

I've merged https://gerrit.wikimedia.org/r/674392 and shut down deployment-logstash2, it can be removed for good in a few days. Puppet was broken on this instance since September 2020, so if anything really still used it, it would probably be broken anyway...

Mentioned in SAL (#wikimedia-releng) [2021-03-24T07:42:30Z] <Majavah> remove deployment-logstash2 hiera from horizon, instahce was shut off earlier by moritzm T238707

Maintenance_bot removed a project: Patch-For-Review.Mar 24 2021, 8:10 AM

taavi closed this task as Resolved.Mar 26 2021, 7:08 AM

taavi assigned this task to 30000lightyears.

taavi removed 30000lightyears as the assignee of this task.

taavi added a subscriber: 30000lightyears.

taavi removed a subscriber: 30000lightyears.

taavi closed subtask T276521: deployment-logstash03 puppet errors as Declined.Jun 19 2021, 11:59 AM

taavi closed subtask T241481: deployment-logstash03: UDP listener died EADDRINUSE, logstash port conflict with rsyslogd as Declined.

Migrate from deployment-logstash2 (jessie) to deployment-logstash03 (stretch)Closed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Migrate from deployment-logstash2 (jessie) to deployment-logstash03 (stretch)
Closed, ResolvedPublic
Actions

Related Objects
Search...