User Details
- User Since
- Apr 3 2017, 6:23 PM (470 w, 5 d)
- Availability
- Available
- IRC Nick
- xionox
- LDAP User
- Ayounsi
- MediaWiki User
- AYounsi (WMF) [ Global Accounts ]
Thu, Apr 9
@jcrespo
We don't reimage backups hosts. Let us know the alternative method (we can put them out of service for an extended time if necessary). This was brought up in advance @ the IF-DP meeting at the offsite.
Which rack, from each "pods" (see task description) could we use to have an additional "public" vlan ? That means those racks will need to have some space for at least 6 hosts day 1, and probably a few more as time goes. But deployment can be staggered (eg. A/B month X, C/D month Y, etc)
That could help slightly for the few hosts that are in the matching row (allow to move them to a rack without re-numbering).
If we look at eqiad and its 18 hosts, that means ~4.5 per row (which we can round up to 4 as some hosts are to be decom), that means it could make the transition smoother for 6 hosts 3 in A/B, 3 in C/D (assuming the rack we pick already have one public hosts).
To be considered, but not strictly needed.
Wed, Apr 8
Tue, Apr 7
What would be a good day to alert about those ? Or even better, not even need an alert ?
Thu, Apr 2
My initial thought was to start with E/F only but you're right better plan it fully here, especially the IP allocations.
Now that we did the switchover, we could focus more on that upgrade. @Papaul let me know if you're ok to take care of it.
Wed, Apr 1
Tue, Mar 31
For the RIPE atlas we will need to decom it and provision a new one on the future sandbox vlan as IPs will change.
The good news is that the standard IP ranges we use for routed Ganeti are all free (no need to shuffle stuff around like we did in ulsfo).
Mon, Mar 30
No deadline, no rush, best effort :)
Ultimately Juniper, I'll take the task for now.
Overall that LGTM, you need to add BGP to security_zones -> production -> services: ['ssh', 'ping', 'traceroute', 'snmp', 'ospf', 'ospf3', 'bgp']
Thu, Mar 26
This looks like a monitoring bug, the interface is properly named on the switch. Other metrics are not being collected properly as well. Most likely not exposed properly by the switch.
Wed, Mar 25
As a side note we will need to manually change the IPs of the routed ganeti nodes in rack 23 to the 10.128.1.0/24 subnet. Normal operation would have required a re-image but to not lose the VMs and make the migration faster it's best to re-IP the hosts.
Tue, Mar 24
All done here. I've also opened T421044: ulsfo: balance VMs between all Ganeti nodes to balance the VMs better.
Mon, Mar 23
Thanks, updated.
Thu, Mar 19
Preferred path changed as expected:
Indeed! The errors were happening with the same levels of traffic as we have now, so looks like it's resolved.
Wed, Mar 18
@KFrancis could you organize the NDA signature for this request ? Thanks
@OKryva-WMF do you approve this request ?
@thcipriani do you approve this request ?
@MPostoronca-WMF could you generate a ed25519 key instead?
@KFrancis can you organize the NDA for this request ? Thanks
Change is merged, you should be good to go in the next ~30min. Please re-open if any issues.
@bvibber you can read and sign the L3 at the end of https://phabricator.wikimedia.org/L3 I don't see your email in the signature list.
@ssingh @Vgutierrez I was wondering if you could prioritize this at some point or agree to drop the current ping offload servers. With the progress to the new network design we won't be able to support those servers in the medium term (6+ months).
Change merged, should be live in ~30min. Please re-open if any issue.
Tue, Mar 17
@thcipriani as approval contact for the deployment group, do you approve this request ?
@Gehel as the approval of the analytics-wmde-users group, do you approve this request ?
I go through the karma dashboard from time to time. I prefer to have the peering sessions on the dashboard rather than task, it allows for better grouping and filtering.
I go through the karma dashboard from time to time. I prefer to have the peering sessions on the dashboard rather than task, it allows for better grouping and filtering.