Tracking task
Description
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Open | None | T346722 Sao Paulo, Brazil, South America POP tracking task | |||
Open | ayounsi | T362421 magru network setup | |||
Open | Fabfur | T362902 Add probenet configuration for magru |
Event Timeline
Change #1019292 had a related patch set uploaded (by Ayounsi; author: Ayounsi):
[operations/homer/public@master] Add magru to homer-public
Prefixes assigned in Netbox: https://netbox.wikimedia.org/ipam/prefixes/?site_id=11
Next step is to create the devices in Netbox and assign the IPs to the interfaces.
And reserve prefixes for the transport links
And fix the TODO in the homer-public patch one we know the details
Change #1019927 had a related patch set uploaded (by Ayounsi; author: Ayounsi):
[operations/software/netbox-extras@master] Netbox validators: add magru
Change #1020087 had a related patch set uploaded (by Volans; author: Volans):
[operations/cookbooks@master] Add configuration for the new magry DC
Change #1019927 merged by jenkins-bot:
[operations/software/netbox-extras@master] Netbox validators: add magru
Change #1020196 had a related patch set uploaded (by Cathal Mooney; author: Cathal Mooney):
[operations/dns@master] Reverse DNS changes for new Magru prefixes
Mentioned in SAL (#wikimedia-operations) [2024-04-17T15:30:30Z] <topranks> making magru IPs live in netbox and generating DNS records with cookbook T362421
Mentioned in SAL (#wikimedia-operations) [2024-04-17T15:40:39Z] <topranks> merging patch and updating dns servers with new magru ranges T362421
Change #1020196 merged by Cathal Mooney:
[operations/dns@master] DNS zone changes for new Magru prefixes
Mentioned in SAL (#wikimedia-operations) [2024-04-17T16:56:34Z] <topranks> running authdns-update to make magru dns records live T362421
Change #1020901 had a related patch set uploaded (by Cathal Mooney; author: Cathal Mooney):
[operations/dns@master] Remove comment added in error
Change #1020901 merged by Cathal Mooney:
[operations/dns@master] Remove comment added in error
Hi, after 73470d0dca68abee0 ntp no longer auto-restarts, but after one of the latest changes (I believe b48874a81565b7051be39659c056), it is pending. Can it be restarted or should it be kept with the old config for a while, and it should be acked?
Hmm that's a good point. @ssingh can you comment here? I'm wondering about the original change, why would it matter if we restart the NTP service "quickly" if we have a new set of peers? The system clock is unlikely to drift by much during the restart if it was synced previously? I take it we had some issue though.
Thanks @jcrespo! I should have silenced the alert or restarted the service; both of those are in progress now so we should see this resolve soon.
@cmooney: Previously we were letting Puppet restart the ntp service within the 30-minute or so window it takes to roll out all changes. On the core sites, it takes around ~10 minutes for the NTP sync to be established with the public pools and other hosts, so that's one thing and since we couldn't figure out a reasonable way within Puppet to splay the restarts, we decided to do this manually instead of letting Puppet do it whenever it wanted. And then we also wanted to fix the issue of when there was a new hardware commissioned and for the initial sync and again basically being in control of when we restart the NTP daemon on the various boxes. So we no longer let Puppet do it, but we have this alert that reminds us that we have to. You are right that the system clock is likely to be in sync during regular restarts anyway but this is more about the entire cluster.
Change #1020087 merged by jenkins-bot:
[operations/cookbooks@master] Add configuration for the new magru DC
Change #1019292 merged by jenkins-bot:
[operations/homer/public@master] Add magru to homer-public
Thanks!
Next step is to create the devices in Netbox and assign the IPs to the interfaces.
Done. All configs now generate fine with Homer \o/
INFO:homer:Homer run completed successfully on 5 devices: ['asw1-b3-magru.mgmt.magru.wmnet', 'asw1-b4-magru.mgmt.magru.wmnet', 'cr1-magru.wikimedia.org', 'cr2-magru.wikimedia.org', 'mr1-magru.wikimedia.org']
Change #1021920 had a related patch set uploaded (by Cathal Mooney; author: Cathal Mooney):
[operations/homer/public@master] Add dummy IPs and uncomment vars for magru
Change #1021920 merged by jenkins-bot:
[operations/homer/public@master] Add dummy IPs and uncomment vars for magru
Change #1021967 had a related patch set uploaded (by Cathal Mooney; author: Cathal Mooney):
[operations/homer/public@master] Set magru DHCP relay server to install1004
Change #1021967 merged by jenkins-bot:
[operations/homer/public@master] Set magru DHCP relay server to install1004
Change #1022098 had a related patch set uploaded (by Cathal Mooney; author: Cathal Mooney):
[operations/dns@master] Reverses for 3 new network connections in magru
Icinga downtime and Alertmanager silence (ID=2c797c95-485f-45b4-85c7-e8514173ae11) set by cmooney@cumin1002 for 0:20:00 on 4 host(s) and their services with reason: disabling oob link on mr1-ulsfo to stop the SSH attempts long enough to get a homer run in
mr1-ulsfo,mr1-ulsfo IPv6,mr1-ulsfo.oob,mr1-ulsfo.oob IPv6
Change #1022098 merged by Cathal Mooney:
[operations/dns@master] Reverses for 3 new network connections in magru
Change #1024516 had a related patch set uploaded (by Ayounsi; author: Ayounsi):
[operations/homer/public@master] magru: update edgeuno transit IP
Change #1024516 merged by jenkins-bot:
[operations/homer/public@master] magru: update edgeuno transit IP
Change #1024815 had a related patch set uploaded (by Ayounsi; author: Ayounsi):
[operations/homer/public@master] magru: add momentum/novacore peer IPs/AS
Change #1024815 merged by jenkins-bot:
[operations/homer/public@master] magru: add momentum/novacore peer IPs/AS
Change #1024848 had a related patch set uploaded (by Ayounsi; author: Ayounsi):
[operations/homer/public@master] Add AS65007 to confederation
Change #1024848 merged by jenkins-bot:
[operations/homer/public@master] Add AS65007 to confederation
Change #1024894 had a related patch set uploaded (by Ayounsi; author: Ayounsi):
[operations/puppet@production] Add magru to Rancid
Change #1024895 had a related patch set uploaded (by Ayounsi; author: Ayounsi):
[operations/puppet@production] Add magru network to monitoring
Change #1024894 merged by Ayounsi:
[operations/puppet@production] Add magru to Rancid
Change #1024895 merged by Ayounsi:
[operations/puppet@production] Add magru network to monitoring
Change #1025414 had a related patch set uploaded (by Ayounsi; author: Ayounsi):
[operations/homer/public@master] magru: update novacore v6 IP
Change #1025414 merged by jenkins-bot:
[operations/homer/public@master] magru: update novacore v6 IP
Change #1025442 had a related patch set uploaded (by Ayounsi; author: Ayounsi):
[operations/homer/public@master] magru: update novacore IPv6 once more
Change #1025442 merged by jenkins-bot:
[operations/homer/public@master] magru: update novacore IPv6 once more
Mentioned in SAL (#wikimedia-operations) [2024-05-01T09:22:38Z] <topranks> withdrawing public prefix announcement to AS7195 to test backup in magru (T362421)