Page MenuHomePhabricator

Cassandra uses default ip address for outbound packets while bootstrapping
Closed, DeclinedPublic

Description

spotted this while bootstrapping restbase1010, it looks like outbound packets are sourced with the main host ip and not the ones configured in cassandra.yaml, in this case it wasn't working because of firewall rules on destination hosts

14:14:03.847886 IP 10.64.0.112.35406 > 10.64.0.220.7000: Flags [S], seq 2750243604, win 29200, options [mss 1460,sackOK,TS val 40440048 ecr 0,nop,wscale 9], length 0
14:14:03.847898 IP 10.64.0.112.36730 > 10.64.0.221.7000: Flags [S], seq 823540075, win 29200, options [mss 1460,sackOK,TS val 40440048 ecr 0,nop,wscale 9], length 0
14:14:03.848099 IP 10.64.0.112.47684 > 10.64.48.99.7000: Flags [S], seq 2253905698, win 29200, options [mss 1460,sackOK,TS val 40440048 ecr 0,nop,wscale 9], length 0
14:14:03.848101 IP 10.64.0.112.37021 > 10.64.48.100.7000: Flags [S], seq 3996782521, win 29200, options [mss 1460,sackOK,TS val 40440048 ecr 0,nop,wscale 9], length 0
14:14:03.848103 IP 10.64.0.112.52035 > 10.64.48.120.7000: Flags [S], seq 3150107822, win 29200, options [mss 1460,sackOK,TS val 40440048 ecr 0,nop,wscale 9], length 0
14:14:03.848105 IP 10.64.0.112.58029 > 10.64.32.187.7000: Flags [S], seq 1891138042, win 29200, options [mss 1460,sackOK,TS val 40440048 ecr 0,nop,wscale 9], length 0
14:14:03.848299 IP 10.64.0.112.57468 > 10.64.32.195.7000: Flags [S], seq 2077856937, win 29200, options [mss 1460,sackOK,TS val 40440048 ecr 0,nop,wscale 9], length 0
14:14:03.848302 IP 10.64.0.112.45408 > 10.64.0.231.7000: Flags [S], seq 3375292860, win 29200, options [mss 1460,sackOK,TS val 40440048 ecr 0,nop,wscale 9], length 0
14:14:03.848486 IP 10.64.0.112.46393 > 10.64.32.159.7000: Flags [S], seq 72980954, win 29200, options [mss 1460,sackOK,TS val 40440048 ecr 0,nop,wscale 9], length 0
14:14:03.848488 IP 10.64.0.112.45264 > 10.64.48.130.7000: Flags [S], seq 2112860170, win 29200, options [mss 1460,sackOK,TS val 40440048 ecr 0,nop,wscale 9], length 0
14:14:03.848490 IP 10.64.0.112.47592 > 10.64.0.230.7000: Flags [S], seq 1749977256, win 29200, options [mss 1460,sackOK,TS val 40440048 ecr 0,nop,wscale 9], length 0
14:14:03.912097 IP 10.64.0.112.38070 > 10.64.32.192.7000: Flags [S], seq 1521230055, win 29200, options [mss 1460,sackOK,TS val 40440064 ecr 0,nop,wscale 9], length 0
restbase1010:~$ host restbase1010.eqiad.wmnet
restbase1010.eqiad.wmnet has address 10.64.0.112
restbase1010:~$ host restbase1010-a.eqiad.wmnet
restbase1010-a.eqiad.wmnet has address 10.64.0.114

Event Timeline

Eevans renamed this task from cassandra uses default ip address for outbound packets while bootstrapping to Cassandra uses default ip address for outbound packets while bootstrapping.Apr 29 2016, 8:15 PM
Eevans added a project: Cassandra.

Weird:

$ sudo tcpdump -ni eth0 src host restbase1010-a.eqiad.wmnet and proto TCP and src portrange 7000-7001
...
16:18:22.947178 IP 10.64.0.116.7000 > 10.64.0.230.38846: Flags [.], ack 19728, win 625, options [nop,nop,TS val 1876382839 ecr 1877066570], length 0
16:18:22.948080 IP 10.64.0.116.7000 > 10.64.32.207.38955: Flags [.], ack 70296, win 1893, options [nop,nop,TS val 1876382839 ecr 1875566858], length 0
16:18:22.950021 IP 10.64.0.116.7001 > 10.192.48.70.56434: Flags [.], ack 1563, win 2301, options [nop,nop,TS val 1876382839 ecr 1418950764], length 0
16:18:22.950267 IP 10.64.0.116.7000 > 10.64.32.207.38955: Flags [.], ack 70367, win 1893, options [nop,nop,TS val 1876382839 ecr 1875566858], length 0
16:18:22.952780 IP 10.64.0.116.7000 > 10.64.32.203.36409: Flags [.], ack 127333, win 1268, options [nop,nop,TS val 1876382840 ecr 1875848084], length 0
16:18:22.953011 IP 10.64.0.116.7001 > 10.192.48.69.36462: Flags [.], ack 11881, win 731, options [nop,nop,TS val 1876382840 ecr 1418950765], length 0
16:18:22.953112 IP 10.64.0.116.7000 > 10.64.0.230.38846: Flags [.], ack 19828, win 625, options [nop,nop,TS val 1876382840 ecr 1877066571], length 0
16:18:22.955494 IP 10.64.0.116.7000 > 10.64.0.230.38846: Flags [.], ack 20320, win 625, options [nop,nop,TS val 1876382841 ecr 1877066572], length 0
16:18:22.955561 IP 10.64.0.116.7000 > 10.64.32.207.38955: Flags [.], ack 70438, win 1893, options [nop,nop,TS val 1876382841 ecr 1875566860], length 0
1549 packets captured
1819 packets received by filter
0 packets dropped by kernel
$ 

restbase1011-a.eqiad.wmnet is sourcing traffic from 7000 and 7001.

$ sudo tcpdump -ni eth0 src host restbase1010.eqiad.wmnet and proto TCP and src portrange 7000-7001
...
16:20:56.009256 IP 10.64.0.112.7001 > 10.192.48.49.53258: Flags [R.], seq 0, ack 2774926345, win 0, length 0
16:20:56.146691 IP 10.64.0.112.7001 > 10.192.48.49.42450: Flags [R.], seq 0, ack 2846120781, win 0, length 0
16:20:56.285205 IP 10.64.0.112.7001 > 10.192.48.49.55610: Flags [R.], seq 0, ack 1463595064, win 0, length 0
16:20:56.422580 IP 10.64.0.112.7001 > 10.192.48.49.46609: Flags [R.], seq 0, ack 2854990770, win 0, length 0
16:20:56.563799 IP 10.64.0.112.7001 > 10.192.48.49.49357: Flags [R.], seq 0, ack 1645837451, win 0, length 0
16:20:56.700951 IP 10.64.0.112.7001 > 10.192.48.49.42508: Flags [R.], seq 0, ack 2990377545, win 0, length 0
16:20:56.838376 IP 10.64.0.112.7001 > 10.192.48.49.36284: Flags [R.], seq 0, ack 1089322175, win 0, length 0
16:20:56.975936 IP 10.64.0.112.7001 > 10.192.48.49.38562: Flags [R.], seq 0, ack 3556649634, win 0, length 0
16:20:57.113562 IP 10.64.0.112.7001 > 10.192.48.49.35763: Flags [R.], seq 0, ack 3299051790, win 0, length 0
16:20:57.251023 IP 10.64.0.112.7001 > 10.192.48.49.49958: Flags [R.], seq 0, ack 3362137409, win 0, length 0
619 packets captured
629 packets received by filter
0 packets dropped by kernel
$ 

restbase1010.eqiad.wmnet (no Cassandra instance bound), is sourcing some packets from 7001 (only).

And port 7001 is only bound to the 3 aliases:

$ sudo netstat -anpl | grep -i listen | grep 7001
tcp        0      0 10.64.0.116:7001        0.0.0.0:*               LISTEN      22600/java      
tcp        0      0 10.64.0.115:7001        0.0.0.0:*               LISTEN      18917/java      
tcp        0      0 10.64.0.114:7001        0.0.0.0:*               LISTEN      14972/java      
unix  2      [ ACC ]     STREAM     LISTENING     277567001 22405/java          /tmp/.java_pid22405.tmp
$ 

[ ... ]

$ sudo tcpdump -ni eth0 src host restbase1010.eqiad.wmnet and proto TCP and src portrange 7000-7001
...
16:20:56.009256 IP 10.64.0.112.7001 > 10.192.48.49.53258: Flags [R.], seq 0, ack 2774926345, win 0, length 0
16:20:56.146691 IP 10.64.0.112.7001 > 10.192.48.49.42450: Flags [R.], seq 0, ack 2846120781, win 0, length 0
16:20:56.285205 IP 10.64.0.112.7001 > 10.192.48.49.55610: Flags [R.], seq 0, ack 1463595064, win 0, length 0
16:20:56.422580 IP 10.64.0.112.7001 > 10.192.48.49.46609: Flags [R.], seq 0, ack 2854990770, win 0, length 0
16:20:56.563799 IP 10.64.0.112.7001 > 10.192.48.49.49357: Flags [R.], seq 0, ack 1645837451, win 0, length 0
16:20:56.700951 IP 10.64.0.112.7001 > 10.192.48.49.42508: Flags [R.], seq 0, ack 2990377545, win 0, length 0
16:20:56.838376 IP 10.64.0.112.7001 > 10.192.48.49.36284: Flags [R.], seq 0, ack 1089322175, win 0, length 0
16:20:56.975936 IP 10.64.0.112.7001 > 10.192.48.49.38562: Flags [R.], seq 0, ack 3556649634, win 0, length 0
16:20:57.113562 IP 10.64.0.112.7001 > 10.192.48.49.35763: Flags [R.], seq 0, ack 3299051790, win 0, length 0
16:20:57.251023 IP 10.64.0.112.7001 > 10.192.48.49.49958: Flags [R.], seq 0, ack 3362137409, win 0, length 0
619 packets captured
629 packets received by filter
0 packets dropped by kernel
$ 

On second look, these are entirely resets (and for more hosts than 10.192.48.49, as show here), and so must be the result of having the main IP listed as a seed everywhere else.

Either what @fgiunchedi saw in March of last year is limited to bootstrapping, or it has since resolved itself.

@fgiunchedi, is this still a thing?

Good question, I think we'll have to try on the next bootstrap and sniff the traffic. So far it hasn't been an issue in practice because we're including the main ip address in ferm rules.

mobrovac changed the task status from Open to Stalled.Jul 4 2018, 3:20 PM

is this still a thing?

Good question, I think we'll have to try on the next bootstrap and sniff the traffic. So far it hasn't been an issue in practice because we're including the main ip address in ferm rules.

Two years later: Does anyone know if this is still an issue? If yes, this task should be open. If not, this task should be declined or resolved. Thanks!

@fgiunchedi, @Eevans: Ping - anyone knows if this is still an issue? If yes, this task should be open. If not, this task should be declined or resolved. Thanks!

Boldly closing declined, please reopen (at a commiserate priority) if this is something we should see to.