Page MenuHomePhabricator

Find a new PIM RP IP
Closed, DeclinedPublic

Description

Our PIM-RP is 208.80.152.194, for legacy reasons: this was when it was still one of the Tampa routers.

That IP is not used, doesn't have a reverse DNS and needs to be replaced. The alternative that I would prefer would be to replace HTCP (by something Kafka-based) and deprecate multicast entirely, but I fear this may take a long time :)

Event Timeline

faidon created this task.Jun 13 2017, 10:47 PM
Restricted Application added a project: Operations. · View Herald TranscriptJun 13 2017, 10:47 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

I don't have a great visibility on how our ip space is divided, but looking at DNS it looks like for example 208.80.154.200 could be a good choice. Close to the eqiad loopback IPs.
Once we have an IP, I can work on a plan to replace it with the least or no downtime.

BBlack added a subscriber: BBlack.EditedJun 15 2017, 8:16 PM

Multicast has its uses in general. Even if we kill HTCP another use may pop up. I did quick survey to try to find active uses. Filtering for just v4, removing the standard references you'd see to 224.0.0.1 everywhere, and ignoring the expected cp machines' multicast address as documented at https://wikitech.wikimedia.org/wiki/Multicast_HTCP_purging#Multicast_Addressing , these were the surprises:

===== NODE GROUP =====                                                                                                                                                                         
(1) contint1001.wikimedia.org                                                                                                                                                                  
----- OUTPUT of 'netstat -g -n|gr...rep -v 224.0.0.1' -----                                                                                                                                    
eth0            1      224.0.0.251                                                                                                                                                             
eth0            1      239.77.124.213
================
===== NODE GROUP =====                                                                                                                                                                         
(6) wdqs[2001-2003].codfw.wmnet,wdqs[1001-1003].eqiad.wmnet                                                                                                                                    
----- OUTPUT of 'netstat -g -n|gr...rep -v 224.0.0.1' -----                                                                                                                                    
eth0            1      239.192.48.84                                                                                                                                                           
================

I don't know if any of these are known about outside of whoever deployed or configured them. If they're legit and we want to keep using them, they should at least be centrally documented to avoid collisions in the multicast address space. We could maybe record all our multicasts in DNS as well?

faidon added a comment.EditedJun 16 2017, 1:31 AM

Oh, thanks for that, that audit is great! These two are indeed surprising, and I think the fact that they are surprising is a good argument for us to get rid of multicast :))

The 239.77.124.213 one is Jenkins and the 239.192.48.84 one is Jolokia. I don't think either are needed -- autodiscovering Jenkins or JMX agents on our network doesn't sound that useful (if anything, it sounds scary). The 224.0.0.0/4 is link-local and will continue to work even without IGMP/PIM.

ayounsi moved this task from Backlog to Configuration on the netops board.Jun 27 2017, 2:38 PM
hashar added a subscriber: hashar.Oct 19 2017, 6:51 PM

Thanks @faidon for the link! Jenkins is never out of surprise. We do not rely on that auto discovery feature and I will get it disabled in the daemon.

ayounsi added a subscriber: Gehel.Oct 19 2017, 7:06 PM

@Gehel See Faidon's comment on T167842#3353703. Is there any reasons to have JMX agent autodiscovery enabled?

Gehel added a comment.Oct 19 2017, 7:33 PM

I see no reason to have jolokia even accessible on the network, it should be local only. I'll have a look into our config (to be honest I don't know much about jolokia, but that's a good occasion to dig into it a bit more).

Change 385337 had a related patch set uploaded (by Gehel; owner: Gehel):
[wikidata/query/rdf@master] Disable discovery of jolokia agent.

https://gerrit.wikimedia.org/r/385337

Gehel added a comment.Oct 20 2017, 9:39 AM

Jolokia is only listening on localhost, and we are not using discovery. The patch above will disable discovery.

... these were the surprises:

===== NODE GROUP =====                                                                                                                                                                         
(1) contint1001.wikimedia.org                                                                                                                                                                  
----- OUTPUT of 'netstat -g -n|gr...rep -v 224.0.0.1' -----                                                                                                                                    
eth0            1      224.0.0.251                                                                                                                                                             
eth0            1      239.77.124.213
================

That was Jenkins and it is no more listening on multicast (T178608). That was for auto discovery of Jenkins instances in the network, a feature we do not need.

Change 385337 merged by jenkins-bot:
[wikidata/query/rdf@master] Disable discovery of jolokia agent.

https://gerrit.wikimedia.org/r/385337

hashar removed a subscriber: hashar.Oct 24 2017, 12:27 PM

Mentioned in SAL (#wikimedia-operations) [2017-10-25T07:40:37Z] <gehel@tin> Started deploy [wdqs/wdqs@0bb2b5c]: wdqs-updater upgrade for jolokia - T167842

Mentioned in SAL (#wikimedia-operations) [2017-10-25T07:42:21Z] <gehel@tin> Finished deploy [wdqs/wdqs@0bb2b5c]: wdqs-updater upgrade for jolokia - T167842 (duration: 01m 44s)

Gehel added a comment.Oct 25 2017, 7:56 AM

@ayounsi : wdqs should be clean of unwanted multicast.

Indeed, confirmed. Thanks!

ayounsi added a comment.EditedMay 17 2018, 1:56 PM

@BBlack from https://wikitech.wikimedia.org/wiki/Multicast_IP_Addresses 239.128.0.114 is strikedthrough, but I see it configured on some servers (eg. cp1071).

Should it be unstrikedthrough or something need to be fixed on the servers?

EDIT: .114 is used on:
cp[2002,2005,2008,2011,2014,2017,2020,2022,2024,2026].codfw.wmnet,cp[1048-1050,1062-1064,1071-1074,1099].eqiad.wmnet,cp[5001-5005].eqsin.wmnet,cp[3034-3039,3044-3049].esams.wmnet,cp[4021-4026].ulsfo.wmnet
I unstrikedthroughed the line.

"239.239.239.0-255 multicast tftp on installserver(s) whole range 0-255 in last octet "
I don't think that's still used.

ayounsi closed this task as Resolved.Thu, Jul 16, 8:17 AM
ayounsi claimed this task.

No more PIM in the infra.

ayounsi changed the task status from Resolved to Declined.Thu, Jul 16, 8:18 AM