User Details
- User Since
- Apr 3 2017, 6:23 PM (472 w, 2 d)
- Availability
- Available
- IRC Nick
- xionox
- LDAP User
- Ayounsi
- MediaWiki User
- AYounsi (WMF) [ Global Accounts ]
Yesterday
Mon, Apr 20
@Jclark-ctr the way it's coded makes it quite flexible, can you try it or ping me with a new server's info so we try it together ? I'll update the doc once we have something fully working.
Fri, Apr 17
Yeah, Postgres is where all the data are. So +1 to not backup anything on the frontends.
We're not doing Netbox CSV dumps anymore. So you can remove that directory from backups.
Wed, Apr 15
Regarding the count of "prefixes received by the switch but not accepted", I think as a first step it could be a global Netops alert (for our internal peers).
However if it gets triggered more often by server side issues, we should move it to the same "remote_instance" mechanism.
It's great to see progress on this ! Unless there are other blockers I'm unaware off, CDN seems better than LVS as it doesn't require to use a new public IP.
Tue, Apr 14
Upgraded to 23.4R2-S8 and all is well.
All done.
I gave a quick review on the CR, but you should at least use https://wikitech.wikimedia.org/wiki/Network_telemetry#remote_instance:gnmi_bgp_neighbor_session_state%7B%7D
Thu, Apr 9
@jcrespo
We don't reimage backups hosts. Let us know the alternative method (we can put them out of service for an extended time if necessary). This was brought up in advance @ the IF-DP meeting at the offsite.
Which rack, from each "pods" (see task description) could we use to have an additional "public" vlan ? That means those racks will need to have some space for at least 6 hosts day 1, and probably a few more as time goes. But deployment can be staggered (eg. A/B month X, C/D month Y, etc)
That could help slightly for the few hosts that are in the matching row (allow to move them to a rack without re-numbering).
If we look at eqiad and its 18 hosts, that means ~4.5 per row (which we can round up to 4 as some hosts are to be decom), that means it could make the transition smoother for 6 hosts 3 in A/B, 3 in C/D (assuming the rack we pick already have one public hosts).
To be considered, but not strictly needed.
Wed, Apr 8
Tue, Apr 7
What would be a good day to alert about those ? Or even better, not even need an alert ?
Thu, Apr 2
My initial thought was to start with E/F only but you're right better plan it fully here, especially the IP allocations.
Now that we did the switchover, we could focus more on that upgrade. @Papaul let me know if you're ok to take care of it.
Wed, Apr 1
Tue, Mar 31
For the RIPE atlas we will need to decom it and provision a new one on the future sandbox vlan as IPs will change.
The good news is that the standard IP ranges we use for routed Ganeti are all free (no need to shuffle stuff around like we did in ulsfo).
Mon, Mar 30
No deadline, no rush, best effort :)
Ultimately Juniper, I'll take the task for now.
Overall that LGTM, you need to add BGP to security_zones -> production -> services: ['ssh', 'ping', 'traceroute', 'snmp', 'ospf', 'ospf3', 'bgp']
Thu, Mar 26
This looks like a monitoring bug, the interface is properly named on the switch. Other metrics are not being collected properly as well. Most likely not exposed properly by the switch.
Wed, Mar 25
As a side note we will need to manually change the IPs of the routed ganeti nodes in rack 23 to the 10.128.1.0/24 subnet. Normal operation would have required a re-image but to not lose the VMs and make the migration faster it's best to re-IP the hosts.
Tue, Mar 24
All done here. I've also opened T421044: ulsfo: balance VMs between all Ganeti nodes to balance the VMs better.