Page MenuHomePhabricator

Move kartotherian/tilerator logging to new logging pipeline
Closed, ResolvedPublic

Description

We need to move Maps (Kartotherian) and Maps (Tilerator) to the new logging pipeline. The config should be updated similar to https://gerrit.wikimedia.org/r/#/c/mediawiki/services/change-propagation/deploy/+/500813 and newest node dependencies should be used (newest version of service-runner).

Event Timeline

Pchelolo edited projects, added Services (watching); removed Services.

Reminder/ping as we (SRE Observability) would like to deprecate all non-kafka inputs by end of Q4 FY19/20. If the service is moving (or has moved) to k8s then what's left to do is disable gelf log output and keep on stdout/stderr. If the service isn't moving to k8s then we'll also need to perform puppet-level changes. Thanks!

Change 602460 had a related patch set uploaded (by Mholloway; owner: Michael Holloway):
[maps/kartotherian/deploy@master] Use new logging pipeline

https://gerrit.wikimedia.org/r/602460

Change 602461 had a related patch set uploaded (by Mholloway; owner: Michael Holloway):
[maps/tilerator/deploy@master] Use new logging pipeline

https://gerrit.wikimedia.org/r/602461

Hi @fgiunchedi, thanks for the reviews! What's the minimum version of service-runner that's required for this? We may need to do a deployment for one or both services to bump dependency versions, but I'm not sure.

Hi @fgiunchedi, thanks for the reviews! What's the minimum version of service-runner that's required for this? We may need to do a deployment for one or both services to bump dependency versions, but I'm not sure.

I _think_ service-runner >= 2.6.18 or 2.6.19 will do, judging from https://phabricator.wikimedia.org/T211125#5206793 and related changes

Great, thanks @fgiunchedi! It looks like we currently have service-runner@2.7.3 in production for both services.

Change 602704 had a related patch set uploaded (by Mholloway; owner: Michael Holloway):
[operations/puppet@production] Include ::profile::rsyslog::udp_localhost_compat in OSM common role

https://gerrit.wikimedia.org/r/602704

Change 602704 merged by Filippo Giunchedi:
[operations/puppet@production] maps: profile::rsyslog::udp_localhost_compat

https://gerrit.wikimedia.org/r/602704

Change 602461 merged by jenkins-bot:
[maps/tilerator/deploy@master] Use new logging pipeline

https://gerrit.wikimedia.org/r/602461

Change 602460 merged by jenkins-bot:
[maps/kartotherian/deploy@master] Use new logging pipeline

https://gerrit.wikimedia.org/r/602460

Hi all, it looks like we've moved to syslog logging in https://gerrit.wikimedia.org/r/c/maps/kartotherian/deploy/+/602460 however the patch isn't deployed yet AFAICS?

Hi all, it looks like we've moved to syslog logging in https://gerrit.wikimedia.org/r/c/maps/kartotherian/deploy/+/602460 however the patch isn't deployed yet AFAICS?

You're right. However the changes are now being prepared for testing in the beta cluster and should land production afterwards, https://gerrit.wikimedia.org/r/c/maps/kartotherian/deploy/+/627503

@fgiunchedi we've been working on fixing OSM replication recently in the eqiad cluster, so we blocked deployments for safety. It's almost finished and if possible, we will try to deploy still this week or in the next. Thanks for your patience.

Mentioned in SAL (#wikimedia-operations) [2020-11-09T20:12:57Z] <mbsantos@deploy1001> Started deploy [kartotherian/deploy@0a38bc5]: Add new target for beta environment and clean-up old envs (T223041 T222377 T255932)

Mentioned in SAL (#wikimedia-operations) [2020-11-09T20:24:33Z] <mbsantos@deploy1001> Finished deploy [kartotherian/deploy@0a38bc5]: Add new target for beta environment and clean-up old envs (T223041 T222377 T255932) (duration: 11m 36s)

Mentioned in SAL (#wikimedia-operations) [2020-11-09T20:26:22Z] <mbsantos@deploy1001> Finished deploy [kartotherian/deploy@0a38bc5]: Add new target for beta environment and clean-up old envs (T223041 T222377 T255932) (duration: 01m 09s)

Mentioned in SAL (#wikimedia-operations) [2020-11-09T21:11:45Z] <mbsantos@deploy1001> Started deploy [tilerator/deploy@97575e4]: Add new target for beta environment and clean-up old envs (T222377)

Mentioned in SAL (#wikimedia-operations) [2020-11-09T21:14:08Z] <mbsantos@deploy1001> Finished deploy [tilerator/deploy@97575e4]: Add new target for beta environment and clean-up old envs (T222377) (duration: 02m 23s)

I'm reopening since I'm still seeing gelf traffic from tilerator and kartotherian from maps codfw: https://logstash-next.wikimedia.org/goto/df736dad5b406f27bb983ee7fd3bf0fc I'm guessing these hosts didn't get the updated deployment?

@hnowlan do you know if this is related to the newer machines being added to the cluster? Is it possible that the new machines didn't get the update?

It appears most of the traffic is coming from new hosts - although maps2003 is also sending some. I think an update will fix it.

I've redeployed the services emitting gelf traffic and it appears these messages have stopped as of 12:43 UTC.

I can indeed confirm all gelf traffic from maps has stopped, thank you @MSantos and @hnowlan for your help!