Since we are replacing logstash100[1-3] with logstash100[7-9], all log producers which currently access one of the logstash node directly need to be reconfigured. Using the LVS endpoint is the obvious solution (logstash.svc.eqiad.wmnet).
Description
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | RobH | T173565 Provision VMs on Ganeti for logstash100[123] | |||
Resolved | Gehel | T175045 setup/install logstash100[7-9].eqiad.wmnet | |||
Resolved | • Cmjohnson | T175830 decommission logstash100[1-3] | |||
Resolved | Gehel | T175242 all log producers need to use the logstash LVS endpoint |
Event Timeline
Change 376500 had a related patch set uploaded (by Ladsgroup; owner: Amir Sarabadani):
[operations/puppet@production] service: Use LVS endpoint for logstash
Mentioned in SAL (#wikimedia-operations) [2017-09-27T12:47:34Z] <akosiaris> T175242 disable puppet across aqs kafka maps maps-test ores restbase restbase-dev sca scb wtp clusters for merging https://gerrit.wikimedia.org/r/#/c/376500/
Change 376500 merged by Alexandros Kosiaris:
[operations/puppet@production] service: Use LVS endpoint for logstash
Mentioned in SAL (#wikimedia-operations) [2017-09-27T12:54:53Z] <akosiaris> T175242 enabled puppet in aqs kafka maps maps-test selected hosts and ran puppet manually.
Mentioned in SAL (#wikimedia-operations) [2017-09-27T13:01:10Z] <akosiaris> T175242 tilerator and tileratorui need manually restart
Mentioned in SAL (#wikimedia-operations) [2017-09-27T13:15:52Z] <akosiaris> T175242 restbase requires manual restart
Mentioned in SAL (#wikimedia-operations) [2017-09-27T13:25:22Z] <akosiaris> T175242 parsoid requires manual restart
Mentioned in SAL (#wikimedia-operations) [2017-09-27T13:31:33Z] <akosiaris> T175242 eventstreams requires manual restart
Mentioned in SAL (#wikimedia-operations) [2017-09-27T13:43:55Z] <akosiaris> T175242 re-enable puppet across aqs kafka maps maps-test ores restbase restbase-dev sca scb wtp clusters for merging https://gerrit.wikimedia.org/r/#/c/376500/. Run puppet as well in a batched execution
Mentioned in SAL (#wikimedia-operations) [2017-09-27T14:03:48Z] <akosiaris> T175242 restart tilerator, tileratorui, restbase across the fleet to pick up the change in a rolling restart manner with a batch size of 2
Mentioned in SAL (#wikimedia-operations) [2017-09-27T14:16:30Z] <akosiaris> T175242 restart parsoid across the fleet to pick up the change in a rolling restart manner with a batch size of 5
Mentioned in SAL (#wikimedia-operations) [2017-09-27T14:17:37Z] <akosiaris> T175242 restart eventstreams across the fleet to pick up the change in a rolling restart manner with a batch size of 2
Change 380991 had a related patch set uploaded (by Gehel; owner: Gehel):
[operations/puppet@production] elasticsearch: use the lgostash LVS endpoint
Change 380992 had a related patch set uploaded (by Gehel; owner: Gehel):
[operations/puppet@production] aqs: switch to LVS endpoint for logstash
Change 380993 had a related patch set uploaded (by Gehel; owner: Gehel):
[operations/puppet@production] striker: switch to LVS endpoint for logstash
Change 380994 had a related patch set uploaded (by Gehel; owner: Gehel):
[operations/puppet@production] mediawiki: switch to LVS endpoint for logstash
Change 380995 had a related patch set uploaded (by Gehel; owner: Gehel):
[operations/puppet@production] ocg: switch to LVS endpoint for logstash
Change 380991 merged by Gehel:
[operations/puppet@production] elasticsearch: use the logstash LVS endpoint
Mentioned in SAL (#wikimedia-operations) [2017-09-28T10:30:45Z] <gehel> restart elasticsearch on relforge to validate new logging config - T175242
Change 380992 merged by Elukey:
[operations/puppet@production] aqs: switch to LVS endpoint for logstash
Change 380994 merged by Gehel:
[operations/puppet@production] mediawiki: switch to LVS endpoint for logstash
Mentioned in SAL (#wikimedia-operations) [2017-10-04T12:00:51Z] <gehel> mediawiki now uses the LVS endpoint for logstash - T175242
Correction, https://gerrit.wikimedia.org/r/380994 is actually a noop, cleaning up a default that is overwritten more globally in hieradata/common.yaml.
Change 380993 merged by Gehel:
[operations/puppet@production] striker: switch to LVS endpoint for logstash
Change 383097 had a related patch set uploaded (by Gehel; owner: Gehel):
[operations/puppet@production] lgostash: all log producers need to use the logstash LVS endpoint
Change 383098 had a related patch set uploaded (by Gehel; owner: Gehel):
[operations/puppet@production] maps: all log producers need to use the logstash LVS endpoint
Change 383098 merged by Gehel:
[operations/puppet@production] maps: all log producers need to use the logstash LVS endpoint
Change 383146 had a related patch set uploaded (by Gehel; owner: Gehel):
[operations/puppet@production] logstash: update logstash_syslog common hiera parameter to point to LVS.
Change 383147 had a related patch set uploaded (by Gehel; owner: Gehel):
[operations/puppet@production] [test] mediawiki: use LVS endpoint for logstash
Change 383147 merged by Gehel:
[operations/puppet@production] [test] mediawiki: use LVS endpoint for logstash
Change 383355 had a related patch set uploaded (by Gehel; owner: Gehel):
[operations/mediawiki-config@master] use the logstash LVS endpoint
Change 383097 merged by Gehel:
[operations/puppet@production] logstash: all log producers need to use the logstash LVS endpoint
Change 380995 abandoned by Gehel:
ocg: switch to LVS endpoint for logstash
Reason:
OCG is being decommed
Change 383355 merged by jenkins-bot:
[operations/mediawiki-config@master] use the logstash LVS endpoint
Mentioned in SAL (#wikimedia-operations) [2017-11-02T13:11:24Z] <zfilipin@tin> Synchronized wmf-config/ProductionServices.php: SWAT: [[gerrit:383355|use the logstash LVS endpoint (T175242)]] (duration: 00m 51s)
Change 388052 had a related patch set uploaded (by Gehel; owner: Gehel):
[operations/puppet@production] cassandra: use LVS endpoint for logstash
Change 383146 merged by Gehel:
[operations/puppet@production] logstash: update logstash_syslog common hiera parameter to point to LVS.
Change 388426 had a related patch set uploaded (by Gehel; owner: Guillaume Lederrey):
[operations/puppet@production] udp2log: use LVS endpoint for logstash
Change 388052 merged by Gehel:
[operations/puppet@production] cassandra: use LVS endpoint for logstash
A short tcpdump session indicates that the only log producers still using logstash100[123] are udp2log and elasticsearch. Elasticsearch restart is in progress, for udp2log https://gerrit.wikimedia.org/r/#/c/388426/ still needs to be merged. Another check will be needed before actually decommissioning those servers.
Change 388426 merged by Gehel:
[operations/puppet@production] udp2log: use LVS endpoint for logstash
Mentioned in SAL (#wikimedia-operations) [2017-12-04T14:37:15Z] <gehel@tin> Started deploy [kartotherian/deploy@e166d87]: dummy kartotherian deployment to test udp2log config change - T175242
Mentioned in SAL (#wikimedia-operations) [2017-12-04T14:37:25Z] <gehel@tin> Finished deploy [kartotherian/deploy@e166d87]: dummy kartotherian deployment to test udp2log config change - T175242 (duration: 00m 03s)
All reference to logstash100[123] have been removed from puppet. I'll still do a check that no traffic is coming to those servers (we might have something outside of puppet) and start decommisionning the servers.
Monitoring traffic for a few hours on logstash100[123] shows that nothing is coming into any of the logstash ports. Thanks to every one who helped this move forward!