Page MenuHomePhabricator

rsyslog on mw1180 seems to not use the logstash LVS endpoint
Closed, ResolvedPublic

Description

While checking that the deployment of https://gerrit.wikimedia.org/r/#/c/383147/ is working fine, I see some unexpected behaviour.

Note: the change above is changing rsyslog configuration to send syslog traffic to logstash.svc.eqiad.wmnet instead of logstash1001.eqiad.wmnet.

Note: functionally, everything seems to work fine, logs are received by logstash and processed correctly, what I see might just be me not understanding the low level things going on (but better be safe than sorry).

What is unexpected: doing a tcpdump (see below) I see traffic going from mw1180 to logstash100[123]. I was expecting to see the LVS endpoing there (logstash.svc.eqiad.wmnet / 10.2.2.36). I see no direct reference to any of the logstash nodes in the rsyslog configuration, so even if I don't understand how LVS is working, I would have expected to see all logstash ingesters, not just logstash100{123] (logstash100[789] are the new logstash ingesters, which are up and running and pooled). rsyslog has been reloaded after the config change, but even if it had not been reloaded, I would expect to see only traffic for logstash1001 (the previously configured endpoint) and not for logstash100[23].

Everything might be working just as expected, but I'd feel better if someone could explain the above before deploying the same change to more nodes.

gehel@mw1180:/etc$ sudo tcpdump -vv -s 0 -i eth0 port 10514
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
13:04:00.671325 IP (tos 0x0, ttl 64, id 33249, offset 0, flags [DF], proto UDP (17), length 1028)
    mw1180.eqiad.wmnet.60235 > logstash1002.eqiad.wmnet.10514: [bad udp cksum 0x593c -> 0xbb3d!] UDP, length 1000
13:04:03.143713 IP (tos 0x0, ttl 64, id 33313, offset 0, flags [DF], proto UDP (17), length 689)
    mw1180.eqiad.wmnet.34458 > logstash1002.eqiad.wmnet.10514: [bad udp cksum 0x57e9 -> 0xf534!] UDP, length 661
13:04:03.146099 IP (tos 0x0, ttl 64, id 33314, offset 0, flags [DF], proto UDP (17), length 773)
    mw1180.eqiad.wmnet.34458 > logstash1002.eqiad.wmnet.10514: [bad udp cksum 0x583d -> 0xdb73!] UDP, length 745
13:04:03.201983 IP (tos 0x0, ttl 64, id 33323, offset 0, flags [DF], proto UDP (17), length 689)
    mw1180.eqiad.wmnet.34458 > logstash1002.eqiad.wmnet.10514: [bad udp cksum 0x57e9 -> 0xf534!] UDP, length 661
13:04:03.203647 IP (tos 0x0, ttl 64, id 33324, offset 0, flags [DF], proto UDP (17), length 773)
    mw1180.eqiad.wmnet.34458 > logstash1002.eqiad.wmnet.10514: [bad udp cksum 0x583d -> 0xdb73!] UDP, length 745
13:04:03.978711 IP (tos 0x0, ttl 64, id 6194, offset 0, flags [DF], proto UDP (17), length 1009)
    mw1180.eqiad.wmnet.58197 > logstash1001.eqiad.wmnet.10514: [bad udp cksum 0x391a -> 0x3c83!] UDP, length 981

Event Timeline

Gehel created this task.Oct 10 2017, 1:09 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 10 2017, 1:09 PM

Mentioned in SAL (#wikimedia-operations) [2017-10-10T13:11:25Z] <gehel> restarting rsyslog on mw1180 - T177833

Gehel updated the task description. (Show Details)Oct 10 2017, 1:28 PM
Gehel closed this task as Resolved.Oct 10 2017, 1:40 PM
Gehel claimed this task.

Found it! Mediawiki seems to talk directly to logstash (ProductionServices.php). In retrospec, I should have checked there earlier...