Page MenuHomePhabricator

magru network setup
Open, HighPublic

Description

Tracking task

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Change #1019292 had a related patch set uploaded (by Ayounsi; author: Ayounsi):

[operations/homer/public@master] Add magru to homer-public

https://gerrit.wikimedia.org/r/1019292

Prefixes assigned in Netbox: https://netbox.wikimedia.org/ipam/prefixes/?site_id=11

Next step is to create the devices in Netbox and assign the IPs to the interfaces.

And reserve prefixes for the transport links

And fix the TODO in the homer-public patch one we know the details

ayounsi triaged this task as High priority.

Change #1019927 had a related patch set uploaded (by Ayounsi; author: Ayounsi):

[operations/software/netbox-extras@master] Netbox validators: add magru

https://gerrit.wikimedia.org/r/1019927

Change #1020087 had a related patch set uploaded (by Volans; author: Volans):

[operations/cookbooks@master] Add configuration for the new magry DC

https://gerrit.wikimedia.org/r/1020087

Change #1019927 merged by jenkins-bot:

[operations/software/netbox-extras@master] Netbox validators: add magru

https://gerrit.wikimedia.org/r/1019927

Change #1020196 had a related patch set uploaded (by Cathal Mooney; author: Cathal Mooney):

[operations/dns@master] Reverse DNS changes for new Magru prefixes

https://gerrit.wikimedia.org/r/1020196

Mentioned in SAL (#wikimedia-operations) [2024-04-17T15:30:30Z] <topranks> making magru IPs live in netbox and generating DNS records with cookbook T362421

Mentioned in SAL (#wikimedia-operations) [2024-04-17T15:40:39Z] <topranks> merging patch and updating dns servers with new magru ranges T362421

Change #1020196 merged by Cathal Mooney:

[operations/dns@master] DNS zone changes for new Magru prefixes

https://gerrit.wikimedia.org/r/1020196

Mentioned in SAL (#wikimedia-operations) [2024-04-17T16:56:34Z] <topranks> running authdns-update to make magru dns records live T362421

Change #1020901 had a related patch set uploaded (by Cathal Mooney; author: Cathal Mooney):

[operations/dns@master] Remove comment added in error

https://gerrit.wikimedia.org/r/1020901

Change #1020901 merged by Cathal Mooney:

[operations/dns@master] Remove comment added in error

https://gerrit.wikimedia.org/r/1020901

Hi, after 73470d0dca68abee0 ntp no longer auto-restarts, but after one of the latest changes (I believe b48874a81565b7051be39659c056), it is pending. Can it be restarted or should it be kept with the old config for a while, and it should be acked?

Hi, after 73470d0dca68abee0 ntp no longer auto-restarts, but after one of the latest changes (I believe b48874a81565b7051be39659c056), it is pending. Can it be restarted or should it be kept with the old config for a while, and it should be acked?

Hmm that's a good point. @ssingh can you comment here? I'm wondering about the original change, why would it matter if we restart the NTP service "quickly" if we have a new set of peers? The system clock is unlikely to drift by much during the restart if it was synced previously? I take it we had some issue though.

Thanks @jcrespo! I should have silenced the alert or restarted the service; both of those are in progress now so we should see this resolve soon.

@cmooney: Previously we were letting Puppet restart the ntp service within the 30-minute or so window it takes to roll out all changes. On the core sites, it takes around ~10 minutes for the NTP sync to be established with the public pools and other hosts, so that's one thing and since we couldn't figure out a reasonable way within Puppet to splay the restarts, we decided to do this manually instead of letting Puppet do it whenever it wanted. And then we also wanted to fix the issue of when there was a new hardware commissioned and for the initial sync and again basically being in control of when we restart the NTP daemon on the various boxes. So we no longer let Puppet do it, but we have this alert that reminds us that we have to. You are right that the system clock is likely to be in sync during regular restarts anyway but this is more about the entire cluster.

Change #1020087 merged by jenkins-bot:

[operations/cookbooks@master] Add configuration for the new magru DC

https://gerrit.wikimedia.org/r/1020087

Change #1019292 merged by jenkins-bot:

[operations/homer/public@master] Add magru to homer-public

https://gerrit.wikimedia.org/r/1019292

Thanks!

Next step is to create the devices in Netbox and assign the IPs to the interfaces.

Done. All configs now generate fine with Homer \o/

INFO:homer:Homer run completed successfully on 5 devices: ['asw1-b3-magru.mgmt.magru.wmnet', 'asw1-b4-magru.mgmt.magru.wmnet', 'cr1-magru.wikimedia.org', 'cr2-magru.wikimedia.org', 'mr1-magru.wikimedia.org']

Change #1021920 had a related patch set uploaded (by Cathal Mooney; author: Cathal Mooney):

[operations/homer/public@master] Add dummy IPs and uncomment vars for magru

https://gerrit.wikimedia.org/r/1021920

Change #1021920 merged by jenkins-bot:

[operations/homer/public@master] Add dummy IPs and uncomment vars for magru

https://gerrit.wikimedia.org/r/1021920

Change #1021967 had a related patch set uploaded (by Cathal Mooney; author: Cathal Mooney):

[operations/homer/public@master] Set magru DHCP relay server to install1004

https://gerrit.wikimedia.org/r/1021967

Change #1021967 merged by jenkins-bot:

[operations/homer/public@master] Set magru DHCP relay server to install1004

https://gerrit.wikimedia.org/r/1021967

Change #1022098 had a related patch set uploaded (by Cathal Mooney; author: Cathal Mooney):

[operations/dns@master] Reverses for 3 new network connections in magru

https://gerrit.wikimedia.org/r/1022098

Icinga downtime and Alertmanager silence (ID=2c797c95-485f-45b4-85c7-e8514173ae11) set by cmooney@cumin1002 for 0:20:00 on 4 host(s) and their services with reason: disabling oob link on mr1-ulsfo to stop the SSH attempts long enough to get a homer run in

mr1-ulsfo,mr1-ulsfo IPv6,mr1-ulsfo.oob,mr1-ulsfo.oob IPv6

Change #1022098 merged by Cathal Mooney:

[operations/dns@master] Reverses for 3 new network connections in magru

https://gerrit.wikimedia.org/r/1022098

Change #1024516 had a related patch set uploaded (by Ayounsi; author: Ayounsi):

[operations/homer/public@master] magru: update edgeuno transit IP

https://gerrit.wikimedia.org/r/1024516

Change #1024516 merged by jenkins-bot:

[operations/homer/public@master] magru: update edgeuno transit IP

https://gerrit.wikimedia.org/r/1024516

Change #1024815 had a related patch set uploaded (by Ayounsi; author: Ayounsi):

[operations/homer/public@master] magru: add momentum/novacore peer IPs/AS

https://gerrit.wikimedia.org/r/1024815

Change #1024815 merged by jenkins-bot:

[operations/homer/public@master] magru: add momentum/novacore peer IPs/AS

https://gerrit.wikimedia.org/r/1024815

Change #1024848 had a related patch set uploaded (by Ayounsi; author: Ayounsi):

[operations/homer/public@master] Add AS65007 to confederation

https://gerrit.wikimedia.org/r/1024848

Change #1024848 merged by jenkins-bot:

[operations/homer/public@master] Add AS65007 to confederation

https://gerrit.wikimedia.org/r/1024848

Change #1024894 had a related patch set uploaded (by Ayounsi; author: Ayounsi):

[operations/puppet@production] Add magru to Rancid

https://gerrit.wikimedia.org/r/1024894

Change #1024895 had a related patch set uploaded (by Ayounsi; author: Ayounsi):

[operations/puppet@production] Add magru network to monitoring

https://gerrit.wikimedia.org/r/1024895

Change #1024894 merged by Ayounsi:

[operations/puppet@production] Add magru to Rancid

https://gerrit.wikimedia.org/r/1024894

Change #1024895 merged by Ayounsi:

[operations/puppet@production] Add magru network to monitoring

https://gerrit.wikimedia.org/r/1024895

Change #1025414 had a related patch set uploaded (by Ayounsi; author: Ayounsi):

[operations/homer/public@master] magru: update novacore v6 IP

https://gerrit.wikimedia.org/r/1025414

Change #1025414 merged by jenkins-bot:

[operations/homer/public@master] magru: update novacore v6 IP

https://gerrit.wikimedia.org/r/1025414

Change #1025442 had a related patch set uploaded (by Ayounsi; author: Ayounsi):

[operations/homer/public@master] magru: update novacore IPv6 once more

https://gerrit.wikimedia.org/r/1025442

Change #1025442 merged by jenkins-bot:

[operations/homer/public@master] magru: update novacore IPv6 once more

https://gerrit.wikimedia.org/r/1025442

Mentioned in SAL (#wikimedia-operations) [2024-05-01T09:22:38Z] <topranks> withdrawing public prefix announcement to AS7195 to test backup in magru (T362421)