Already partially discussed on T218729 but I noticed the sheer number of subscribers to that task and opted to make a subtask for this individual instance.
Description
Details
Subject | Repo | Branch | Lines +/- | |
---|---|---|---|---|
Remove deployment-logstash2 | operations/puppet | production | +1 -31 | |
deployment-prep: Migrate to new logstash host | operations/puppet | production | +8 -10 |
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Invalid | None | T197804 Puppet: forbid new Python2 code | |||
Open | None | T218426 Upgrade various Cloud VPS Python 2 scripts to Python 3 | |||
Resolved | BUG REPORT | Bstorm | T218423 Add python 3 packages to openstack::clientpackages::common | ||
Resolved | MoritzMuehlenhoff | T232677 Remove support for Debian Jessie in Cloud Services | |||
Duplicate | None | T236575 "deployment-prep" Cloud VPS project jessie deprecation | |||
Resolved | None | T218729 Migrate deployment-prep away from Debian Jessie to Debian Stretch/Buster | |||
Resolved | None | T238707 Migrate from deployment-logstash2 (jessie) to deployment-logstash03 (stretch) | |||
Declined | None | T241481 deployment-logstash03: UDP listener died EADDRINUSE, logstash port conflict with rsyslogd | |||
Declined | None | T276521 deployment-logstash03 puppet errors |
Event Timeline
✅
✅
hmmm, looks like references to this host are a little scattered:
alex@alex-laptop:~/Development/Wikimedia/Operations-Puppet (production)$ git grep deployment-logstash2 hieradata/labs.yaml:role::logging::mediawiki::udp2log::logstash_host: 'deployment-logstash2.deployment-prep.eqiad.wmflabs' hieradata/labs/deployment-prep/common.yaml: - "deployment-logstash2.deployment-prep.eqiad.wmflabs:10514" hieradata/labs/deployment-prep/common.yaml:service::configuration::logstash_host: deployment-logstash2.deployment-prep.eqiad.wmflabs hieradata/labs/deployment-prep/common.yaml: logstash_host: deployment-logstash2.deployment-prep.eqiad.wmflabs hieradata/labs/deployment-prep/common.yaml:logstash_host: deployment-logstash2.deployment-prep.eqiad.wmflabs hieradata/labs/deployment-prep/common.yaml: - 'deployment-logstash2.deployment-prep.eqiad.wmflabs:9093' hieradata/labs/deployment-prep/host/deployment-logstash2.yaml: - deployment-logstash2.deployment-prep.eqiad.wmflabs hieradata/labs/deployment-prep/host/deployment-logstash2.yaml: - deployment-logstash2.deployment-prep.eqiad.wmflabs hieradata/labs/deployment-prep/host/deployment-logstash2.yaml:role::kibana::serveradmin: root@deployment-logstash2.deployment-prep.eqiad.wmflabs hieradata/labs/wikidata-query/common.yaml:profile::query_service::logstash_host: 'deployment-logstash2.deployment-prep.eqiad.wmflabs' modules/base/manifests/remote_syslog.pp:# (e.g. ["centrallog1001.eqiad.wmnet"] or ["deployment-logstash2.deployment-prep.eqiad.wmflabs:10514"]) modules/role/manifests/beta/puppetmaster.pp: logstash_host => 'deployment-logstash2.deployment-prep.eqiad.wmflabs', modules/scap/templates/scap.cfg.erb:logstash_host: deployment-logstash2.deployment-prep.eqiad.wmflabs:9200
Guess I'll upload a puppet commit to update deployment-prep/common.yaml and the non-hieradata stuff.
alex@alex-laptop:~/Development/Wikimedia/instance-puppet (master)$ git grep deployment-logstash2 deployment-prep/_.yaml: deployment-logstash2.deployment-prep.eqiad.wmflabs: deployment-prep/_.yaml: deployment-logstash2.deployment-prep.eqiad.wmflabs: deployment-prep/_.yaml:- deployment-logstash2.deployment-prep.eqiad.wmflabs:9093 deployment-prep/_.yaml:service::configuration::logstash_host: deployment-logstash2.deployment-prep.eqiad.wmflabs deployment-prep/deployment-aqs.yaml:profile::aqs::logstash_host: deployment-logstash2.deployment-prep.eqiad.wmflabs deployment-prep/deployment-aqs.yaml: logstash_host: deployment-logstash2.deployment-prep.eqiad.wmflabs deployment-prep/deployment-docker-cxserver01.deployment-prep.eqiad.wmflabs.yaml: - host: deployment-logstash2.deployment-prep.eqiad.wmflabs deployment-prep/deployment-eventgate-1.deployment-prep.eqiad.wmflabs.roles: - host: deployment-logstash2.deployment-prep.eqiad.wmflabs deployment-prep/deployment-eventgate-1.deployment-prep.eqiad.wmflabs.roles: metadata.broker.list: deployment-logstash2.deployment-prep.eqiad.wmflabs:9092 deployment-prep/deployment-eventgate-1.deployment-prep.eqiad.wmflabs.roles: - host: deployment-logstash2.deployment-prep.eqiad.wmflabs deployment-prep/deployment-eventgate-1.deployment-prep.eqiad.wmflabs.roles: - host: deployment-logstash2.deployment-prep.eqiad.wmflabs deployment-prep/deployment-eventgate-1.deployment-prep.eqiad.wmflabs.yaml: - host: deployment-logstash2.deployment-prep.eqiad.wmflabs deployment-prep/deployment-eventgate-1.deployment-prep.eqiad.wmflabs.yaml: metadata.broker.list: deployment-logstash2.deployment-prep.eqiad.wmflabs:9092 deployment-prep/deployment-eventgate-1.deployment-prep.eqiad.wmflabs.yaml: - host: deployment-logstash2.deployment-prep.eqiad.wmflabs deployment-prep/deployment-eventgate-1.deployment-prep.eqiad.wmflabs.yaml: - host: deployment-logstash2.deployment-prep.eqiad.wmflabs deployment-prep/deployment-eventgate-2.deployment-prep.eqiad.wmflabs.yaml: - host: deployment-logstash2.deployment-prep.eqiad.wmflabs deployment-prep/deployment-eventgate-2.deployment-prep.eqiad.wmflabs.yaml: metadata.broker.list: deployment-logstash2.deployment-prep.eqiad.wmflabs:9092 deployment-prep/deployment-eventgate-2.deployment-prep.eqiad.wmflabs.yaml: - host: deployment-logstash2.deployment-prep.eqiad.wmflabs deployment-prep/deployment-eventgate-2.deployment-prep.eqiad.wmflabs.yaml: - host: deployment-logstash2.deployment-prep.eqiad.wmflabs deployment-prep/deployment-eventgate-3.deployment-prep.eqiad.wmflabs.yaml: - host: deployment-logstash2.deployment-prep.eqiad.wmflabs deployment-prep/deployment-eventgate-3.deployment-prep.eqiad.wmflabs.yaml: metadata.broker.list: deployment-logstash2.deployment-prep.eqiad.wmflabs:9092 deployment-prep/deployment-eventgate-3.deployment-prep.eqiad.wmflabs.yaml: - host: deployment-logstash2.deployment-prep.eqiad.wmflabs deployment-prep/deployment-eventgate-3.deployment-prep.eqiad.wmflabs.yaml: - host: deployment-logstash2.deployment-prep.eqiad.wmflabs deployment-prep/deployment-logstash2.deployment-prep.eqiad.wmflabs.yaml: - deployment-logstash2.deployment-prep.eqiad.wmflabs deployment-prep/deployment-logstash2.deployment-prep.eqiad.wmflabs.yaml: - deployment-logstash2.deployment-prep.eqiad.wmflabs deployment-prep/deployment-mediawiki-.yaml:- deployment-logstash2.deployment-prep.eqiad.wmflabs:9093 deployment-prep/deployment-sessionstore.yaml: logstash_host: deployment-logstash2.deployment-prep.eqiad.wmflabs ores/_.yaml:logstash_host: deployment-logstash2.eqiad.wmflabs phabricator/_.yaml:mediawiki::forward_syslog: deployment-logstash2.deployment-prep.eqiad.wmflabs:10514 striker/striker-uwsgi.yaml: LOGSTASH_HOST: deployment-logstash2.eqiad.wmflabs wikidata-query/_.yaml:wdqs::logstash_host: deployment-logstash2.deployment-prep.eqiad.wmflabs
(that .roles file should actually just be a list of roles, think this was a bug with the import script that set up the repo, have told Andrew)
the only deployment-eventgate host existing now is -3 so I think some of this is just a lack of old hieradata getting deleted on instance deletion (edit: T238708)
I'll update deployment-prep stuff, I'm not sure anything outside the project should be communicating with this.
(also apparently kibana4.wmflabs.org)
horizon-based hieradata changes:
https://gerrit.wikimedia.org/r/plugins/gitiles/cloud/instance-puppet/+/5f26dcdb608d31f477ec2f74de31f55c81fa4665%5E%21/#F0
https://gerrit.wikimedia.org/r/plugins/gitiles/cloud/instance-puppet/+/0660c88726226d038e7da0546f9e1f6192f565c5%5E%21/#F0
https://gerrit.wikimedia.org/r/plugins/gitiles/cloud/instance-puppet/+/9f14f09b6cfa2631ae11283118d12a297517bbee%5E%21/#F0
https://gerrit.wikimedia.org/r/plugins/gitiles/cloud/instance-puppet/+/f0dd923a8cfdff1ee269a11614e774b032ef338e%5E%21/#F0
https://gerrit.wikimedia.org/r/plugins/gitiles/cloud/instance-puppet/+/937bb910a49cb18418b6372ad56c4a3fc7d5b8b4%5E%21/#F0
https://gerrit.wikimedia.org/r/plugins/gitiles/cloud/instance-puppet/+/943782a36eef45a4f4f45c0a6637b4c435adb03d%5E%21/#F0
also https://gerrit.wikimedia.org/r/plugins/gitiles/cloud/instance-puppet/+/31279f370ab1b8f4cdab33eeeaf030b2ecced6a2%5E%21/#F0 to replace that hieradata/labs.yaml entry
Change 551946 had a related patch set uploaded (by Alex Monk; owner: Alex Monk):
[operations/puppet@production] deployment-prep: Migrate to new logstash host
Mentioned in SAL (#wikimedia-releng) [2019-11-20T00:47:34Z] <Krenair> T238707 moved kibana4/logstash-beta proxies to deployment-logstash03, copied /etc/logstash/htpasswd file
Mentioned in SAL (#wikimedia-releng) [2019-11-20T00:51:19Z] <Krenair> T238707 created old-logstash-beta proxy to point at old instance, created default Index Pattern on new logstash-beta
Mentioned in SAL (#wikimedia-releng) [2019-11-20T00:53:47Z] <Krenair> T238707 changed dateFormat (under management -> advanced settings) from 'MMMM Do YYYY, HH:mm:ss.SSS' to 'YYYY-MM-DDTHH:mm:ss', and dateFormat:tz from Browser to UTC to match old instance
Change 551946 merged by Andrew Bogott:
[operations/puppet@production] deployment-prep: Migrate to new logstash host
Thanks for working on this! Good point re: dashboards, they live in the .kibana index. If the new elasticsearch cluster has access to the old one the easiest option is probably to use the reindex api, e.g.
curl -X POST "localhost:9200/_reindex" -H 'Content-Type: application/json' -d" { \"source\": { \"remote\": { \"host\": \"http://${source}\" }, \"index\": \".kibana\" }, \"dest\": { \"index\": \".kibana\" } } "
looks like each of the logstash hosts runs its own elasticsearch cluster locally, would our source be something like deployment-logstash2.deployment-prep.eqiad.wmflabs:9200 or deployment-logstash2.deployment-prep.eqiad.wmflabs:9300 ? it seems we'd need to configure reindex.remote.whitelist somewhere too though I have no idea where
You'd be launching the reindex call on logstash3 setting logstash2:9200 as the remote source, you are correct that reindex.remote.whitelist needs to be set in elasticsearch.yml! Alternatively a dump/reload scheme would also work, e.g. with https://github.com/taskrabbit/elasticsearch-dump (never tried it though)
Alright, I:
- disabled puppet on deployment-logstash03
- edited 03's /etc/elasticsearch/labs-logstash-eqiad/elasticsearch.yml to add reindex.remote.whitelist: deployment-logstash2.deployment-prep.eqiad.wmflabs:9200
- disabled puppet on deployment-logstash2
- created a /etc/ferm/conf.d/20_T238707 file on 2 containing &R_SERVICE(tcp, 9200, @resolve(deployment-logstash03.deployment-prep.eqiad.wmflabs));
- live-hacked /etc/ferm/conf.d/10_mtail to not have an AAAA rule, due to T153468: Ferm's upstream Net::DNS Perl library questionable handling of NOERROR responses without records causing puppet errors when we try to @resolve AAAA in labs preventing ferm from starting
- restarted ferm on 2
- added security group rule to the logstash security group allowing port 9200 from other instances in the group
- ran this:
root@deployment-logstash03:~# curl -X POST "localhost:9200/_reindex" -H 'Content-Type: application/json' -d" { \"source\": { \"remote\": { \"host\": \"http://deployment-logstash2.deployment-prep.eqiad.wmflabs:9200\" }, \"index\": \".kibana\" }, \"dest\": { \"index\": \".kibana\" } } " {"took":841,"timed_out":false,"total":151,"updated":0,"created":151,"deleted":0,"batches":1,"version_conflicts":0,"noops":0,"retries":{"bulk":0,"search":0},"throttled_millis":0,"requests_per_second":-1.0,"throttled_until_millis":0,"failures":[]}
- re-enabled and ran puppet on those two hosts
and some dashboards have appeared at https://logstash-beta.wmflabs.org/app/kibana#/dashboards?_g=() just like on old-logstash-beta
Do we need to do anything else?
AFAIK if dashboards have been migrated then deployment-logstash02 should be ready to be turned off
Mentioned in SAL (#wikimedia-releng) [2019-11-26T01:02:15Z] <Krenair> Shut down deployment-logstash2 T238707
Something happened with this in T233134#5713956 though I don't really understand what is required.
Note: other Cloud VPS projects (wikidata-query, striker, ores, phabricator) appear to also be using deployment-logstash2. Not sure if they are actually using it but those at least have hiera keys pointing to logstash2.
Shall we just wholesale point these to deployment-logstash03? Even if some turn out to be unused or broken, that's still better than sending them to a server which will soon need to be removed :-)
Likely yes, but I'm not a project admin on those projects and have not found time or motivation go thru all of them and contact their maintainers. Ideally that would be turned to a service record instead of pointing to individual hosts, maybe something like logstash.svc.deployment-prep.eqiad1.wikimedia.cloud (which is now possible, T276624).
Change 674392 had a related patch set uploaded (by Muehlenhoff; owner: Muehlenhoff):
[operations/puppet@production] Remove deployment-logstash2
I have created https://gerrit.wikimedia.org/r/674392 and will simply CC a few people who should be able to fix it to the patch.
Change 674392 merged by Muehlenhoff:
[operations/puppet@production] Remove deployment-logstash2
I've merged https://gerrit.wikimedia.org/r/674392 and shut down deployment-logstash2, it can be removed for good in a few days. Puppet was broken on this instance since September 2020, so if anything really still used it, it would probably be broken anyway...
Mentioned in SAL (#wikimedia-releng) [2021-03-24T07:42:30Z] <Majavah> remove deployment-logstash2 hiera from horizon, instahce was shut off earlier by moritzm T238707