Page MenuHomePhabricator

Migrate mwlog/udp2log servers to Buster
Closed, ResolvedPublic

Description

These are currently running jessie:

  • mwlog1001.eqiad.wmnet
  • mwlog2001.codfw.wmnet

Event Timeline

ArielGlenn triaged this task as Medium priority.Jun 11 2019, 7:57 AM
MoritzMuehlenhoff renamed this task from Migrate mwlog/udp2log servers to Stretch/Buster to Migrate mwlog/udp2log servers to Buster.Sep 19 2019, 7:28 AM

Change 667911 had a related patch set uploaded (by Herron; owner: Herron):
[operations/puppet@production] assign mwlog2002 role::logging::mediawiki::udp2log

https://gerrit.wikimedia.org/r/667911

Change 667912 had a related patch set uploaded (by Herron; owner: Herron):
[operations/puppet@production] assign mwlog1002 role::logging::mediawiki::udp2log

https://gerrit.wikimedia.org/r/667912

herron moved this task from Radar to In progress on the observability board.
herron subscribed.

Change 667911 merged by Herron:
[operations/puppet@production] assign mwlog2002 role::logging::mediawiki::udp2log

https://gerrit.wikimedia.org/r/667911

Change 667912 merged by Herron:
[operations/puppet@production] assign mwlog1002 role::logging::mediawiki::udp2log

https://gerrit.wikimedia.org/r/667912

Change 673026 had a related patch set uploaded (by Herron; owner: Herron):
[operations/puppet@production] mwlog: add primary/standby host settings and rsync

https://gerrit.wikimedia.org/r/673026

Change 673026 merged by Herron:
[operations/puppet@production] mwlog: add primary/standby host settings and rsync

https://gerrit.wikimedia.org/r/673026

Change 673594 had a related patch set uploaded (by Herron; author: Herron):
[operations/puppet@production] mwlog: "tee" udp2logs received to all mwlog hosts

https://gerrit.wikimedia.org/r/673594

Change 673594 merged by Herron:
[operations/puppet@production] mwlog: "tee" udp2logs received to all mwlog hosts

https://gerrit.wikimedia.org/r/673594

Change 676995 had a related patch set uploaded (by Herron; author: Herron):

[operations/puppet@production] point wikimania scholarships to mwlog1002

https://gerrit.wikimedia.org/r/676995

Change 676996 had a related patch set uploaded (by Herron; author: Herron):

[operations/puppet@production] deploy logster_alarm to mwlog1002

https://gerrit.wikimedia.org/r/676996

Change 676997 had a related patch set uploaded (by Herron; author: Herron):

[operations/puppet@production] add mwlog[12]002 to profile::dumps::rsync_internal_clients

https://gerrit.wikimedia.org/r/676997

Change 677002 had a related patch set uploaded (by Herron; author: Herron):

[operations/mediawiki-config@master] replace mwlog1001 with new mwlog[12]002 hosts

https://gerrit.wikimedia.org/r/677002

Change 677002 merged by jenkins-bot:

[operations/mediawiki-config@master] replace mwlog1001 with new mwlog[12]002 hosts

https://gerrit.wikimedia.org/r/677002

Mentioned in SAL (#wikimedia-operations) [2021-05-05T18:24:06Z] <tgr@deploy1002> Synchronized wmf-config/ProductionServices.php: Config: [[gerrit:677002|replace mwlog1001 with new mwlog[12]002 hosts (T224565)]] (duration: 01m 24s)

Change 685562 had a related patch set uploaded (by Herron; author: Herron):

[operations/dns@master] udplog: repoint CNAME to new hosts mwlog[12]002

https://gerrit.wikimedia.org/r/685562

Change 676997 merged by Herron:

[operations/puppet@production] add mwlog[12]002 to profile::dumps::rsync_internal_clients

https://gerrit.wikimedia.org/r/676997

Change 676995 merged by Herron:

[operations/puppet@production] point wikimania scholarships to mwlog1002

https://gerrit.wikimedia.org/r/676995

Change 685562 merged by Herron:

[operations/dns@master] udplog: repoint CNAME to new hosts mwlog[12]002

https://gerrit.wikimedia.org/r/685562

Change 676996 merged by Herron:

[operations/puppet@production] deploy logster_alarm to mwlog1002

https://gerrit.wikimedia.org/r/676996

Along with migrating these hosts to buster I've deployed an updated config to make mwlog more of a multi-datacenter service.

Logs that arrive on (e.g. mwlog1002:8420) are now mirrored (via an rsyslog "udp_tee" config) to both a local udp2log (e.g. mwlog1002:8421) and the remote datacenter udp2log (e.g. mwlog2002:8421). This happens in both directions.

As of today medaiwiki and udplog DNS records are directing logs to the mwlog[12]002 host in the local datacenter, which again will mirror them off to mwlog host at the opposite dc.

I'm not seeing traffic arriving to mwlog1001 on udp/8420 any longer. Will proceed with decom of mwlog[12]001 hosts shortly.

ArcLamp (performance flamegraphs) stopped getting data on May 5, likely as a result of this change. Other things which might have also been missed:

$ find . -type f | xargs grep 'mwlog1001'
./hieradata/common/profile/dumps.yaml:    - mwlog1001.eqiad.wmnet
./hieradata/role/common/logging/mediawiki/udp2log.yaml:  - 'mwlog1001.eqiad.wmnet:8421'
./hieradata/role/common/webperf/profiling_tools.yaml:profile::webperf::arclamp::redis_host: 'mwlog1001.eqiad.wmnet'
./hieradata/role/eqiad/logging/mediawiki/udp2log.yaml:profile::mediawiki::mwlog::primary_host: mwlog1001.eqiad.wmnet
./modules/wikimania_scholarships/manifests/init.pp:    Stdlib::Fqdn $udp2log_host   = 'mwlog1001.eqiad.wmnet',

Change 687928 had a related patch set uploaded (by Dave Pifke; author: Dave Pifke):

[operations/puppet@production] arclamp: switch to mwlog1002

https://gerrit.wikimedia.org/r/687928

Change 687928 merged by Elukey:

[operations/puppet@production] arclamp: switch to mwlog1002

https://gerrit.wikimedia.org/r/687928

Change 688281 had a related patch set uploaded (by Herron; author: Herron):

[operations/mediawiki-config@master] arclamp/xenon: point codfw hosts to eqiad (mwlog1002)

https://gerrit.wikimedia.org/r/688281

Change 688317 had a related patch set uploaded (by Herron; author: Herron):

[operations/puppet@production] scholarships: update default value to mwlog1002

https://gerrit.wikimedia.org/r/688317

Thanks for correcting the oversight on arclamp/xenon, TIL

I've created https://wikitech.wikimedia.org/wiki/Mwlog just now to help clear up the services that are deployed on these hosts, for future reference.

Other things which might have also been missed:

Thanks, will step through these one by one:

./hieradata/common/profile/dumps.yaml: - mwlog1001.eqiad.wmnet

This is expected, it's an acl which includes new mwlog hosts as well

./hieradata/role/common/logging/mediawiki/udp2log.yaml: - 'mwlog1001.eqiad.wmnet:8421'

This is expected, it's defining the udp2log backends that logs over udp/8420 will be mirrored to

./hieradata/role/common/webperf/profiling_tools.yaml:profile::webperf::arclamp::redis_host: 'mwlog1001.eqiad.wmnet'

Fixed above, thank you

./hieradata/role/eqiad/logging/mediawiki/udp2log.yaml:profile::mediawiki::mwlog::primary_host: mwlog1001.eqiad.wmnet

This is expected, it's used primarily to enable rsync from old host to new

./modules/wikimania_scholarships/manifests/init.pp: Stdlib::Fqdn $udp2log_host = 'mwlog1001.eqiad.wmnet',

This param is switched to mwlog1001 in the profile already, but yes might as well update the default too (https://gerrit.wikimedia.org/r/c/operations/puppet/+/688317)

Change 688317 merged by Herron:

[operations/puppet@production] scholarships: update default value to mwlog1002

https://gerrit.wikimedia.org/r/688317

ArcLamp data is arriving again, and I'm working on fixing our monitoring for it.

Thanks for taking a look at the others.

Change 688281 merged by jenkins-bot:

[operations/mediawiki-config@master] arclamp/xenon: point all hosts to eqiad (mwlog1002)

https://gerrit.wikimedia.org/r/688281

Mentioned in SAL (#wikimedia-operations) [2021-05-10T16:23:36Z] <herron@deploy1002> Synchronized wmf-config/ProductionServices.php: Config: [[gerrit:688281|arclamp/xenon: point all hosts to eqiad (mwlog1002) (T224565)]] (duration: 00m 59s)

Since deployers are expected to SSH onto this host (edit: I’m referring to mwlog1002 but I guess that also applies to the codfw one), can someone add its SSH fingerprint to Wikitech?

Here’s the fingerprint if anyone else needs it:

$ curl -sL https://config-master.wikimedia.org/known_hosts.ecdsa | grep mwlog1002 | ssh-keygen -lf-
256 SHA256:CsX5UuVYhSAiJyAXUfOl4gtfahkBTHNK8F1c0flQEzo mwlog1002.eqiad.wmnet,10.64.32.141,2620:0:861:103:10:64:32:141 (ECDSA)

SHA256:CsX5UuVYhSAiJyAXUfOl4gtfahkBTHNK8F1c0flQEzo

lmata updated the task description. (Show Details)
lmata updated the task description. (Show Details)