I'm boldly resolving this as I think everything has been fixed and merged. Feel free to reopen if needed.
There is a 5th way and is via Redfish API ;)
We do have basic support for redfish API in spicerack right now and there is plan to add support for RAID configuration but that's not yet ready.
That said, if you have to perform this on a large number for hosts, it's could still probably be worth looking at a one-off way using what we have in spicerack and do the setup via redfish.
Feel free to ping me offline if you want to explore this approach.
I've updated Netbox running the following code:
@bking FYI I'll skip elastic2035 as it's offline in Netbox and doesn't have any IP.
Mon, May 23
Yes I agree.
Fri, May 20
Sounds good to me. Thanks for the update :)
Thu, May 19
@Papaul the above patch was merged and deployed. I think it should fix the issue. Please resolve the task if that's the case or let me know what's the new error.
That's probably the change in https://gerrit.wikimedia.org/r/c/operations/software/netbox-extras/+/789089 I'll have a look.
Sorry to add another one, it should be the last one, I'm working on some refactor of the zone_validator script and cleaning up some of the warnings reported ;)
@Dwisehaupt thanks for taking care of this.
Another record that could be inconsistent (or like that by design) is the payments-eqiad record, that has a reverse for payments:
Wed, May 18
Re-opening because if there is no technical blocker for having the AAAA records on those hosts and your service are IPv6 ready , then we should add them to standardize our infrastructure and remove technical debt.
Tue, May 17
I've a local patch that I'm testing to perform the validation of the whole dataset (manual + netbox). The preliminary results are below. I will have a look at the reported errors (that seems legit at first sight) and also at the warnings that might not be reported correctly anymore (some seems a bit too many).
I've merged the change and @jcrespo has run it manually. The backup is working fine. Resolving.
Thu, May 12
Could you try this?
We looked at the logs with John and Papaul during our last meeting and agreed that it took a long time for mdadm+mkfs to create the software raid partition and format it. Hence decided to just increase the current timeout in spicerack, I'll make the patch.
Wed, May 11
Mon, May 2
Ack, let me repurpose this one.
Sat, Apr 30
@Papaul is it normal that it's so slow to just create an empty partition?
We can surely increase the number or add some tweak in spicerack to make it a bit more dynamic. I'll take a look at it next week.
@fgiunchedi yes and no, duplicates within the operations/dns repository are currently catched, but duplication within the automatically-generated data or between the manual and the generated data are not.
What we could do is to refactor a bit zone validator to inject into the zonefiles the netbox generated data for each INCLUDE before parsing the file. That should allow to catch all issues, but would also mean that some wrong data in Netbox might make CI fail on a totally valid dns patch.
What are your thoughts?
Tue, Apr 26
Sure, but they could cause various unwanted issues in different contexes, like not matching the fingerprint in the known hosts file for SSH connections:
cumin1001 $ SSH_AUTH_SOCK=/run/keyholder/proxy.sock ssh email@example.com. The authenticity of host 'sretest1001.eqiad.wmnet. (2620:0:861:107:10:64:48:138)' can't be established. [...SNIP...]
That's why I would rather prefer to keep the data consistent in Netbox without the ending period, so that the behaviour is consistent everywhere. For the DNS -specific bits the period is automatically added always.
The DNS Name field in Netbox is an FQDN, the same Netbox UI help message for the field is: Hostname or FQDN (not case-sensitive)
For this reason I think that this task should be closed as invalid, and instead we should move towards having consistent data in Netbox.
The DNS Name field is used in multiple places in different automation, from Homer to various Netbox scripts and is always considered to be an FQDN.
Apr 22 2022
I've updated the task description according to T306654#7873125.
As for the puppet-merge on the puppetmasters, does the datacenter-ops have +2 on the operations/puppet repository on Gerrit?
Apr 21 2022
Thanks for opening the task to discuss details. As the first feedback I've a primary question that is how you envision this new third way to configured the network devices to re-conciliate with the existing two?
Basically, if we do a change via this method, would then homer be out of sync? Or anything we plan to do with this method will be automatically included in homer runs and so would be a noop for homer based on the updated Netbox configuration and the new state of the network device?
@KSiebert thanks, it's all done. There was some confusion based on which email should the account be associated with.
@TheresNoTime I'm resolving this, feel free to reopen it in case you encounter any issue.
Sorry, I did overlooked the request, as your account is with an @wikimedia.org email account I've granted you the wmf group in LDAP and revoked the nda one as they can't cohexist.
But don't worry, if/when the contract will end you will be able to request back the nda one if needed.
Apr 20 2022
I've +1ed the patch, @Ottomata feel free to merge whenever works for you.
Patch merged, resolving.
Patch merged, resolving.
Granted ldap/wmf to uid= samtar, revoked pre-existing ldap/nda one as they can't coexists on the same account.
Don't worry if/when the contract will be over you can re-get the nda one.
As clarified in the related task above, granted ldap/wmf to uid=maryyang.
LDAP wmf group granted for aassaf.
@Jdforrester-WMF I can check what's the difference in Gerrit, it depends on the repositories I guess.
Do they have an @wikimedia.org email account? As per https://wikitech.wikimedia.org/wiki/SRE/Clinic_Duty/Access_requests#WMF_group we usually grant the ldap/wmf group only to staff and contractors with an @wikimedia.org email account.
Granted ldap/nda group, confirmation of NDA on file is in T249873#7865953. Resolving.
For contractors we usually grant the ldap/nda group instead, at the practical level they are almost equivalent, so that should work too.
@TheresNoTime Would be ok for you to convert this request into requesting the ldap/nda group?
@jmads your kerberos account should still be valid, as far as I can tell. Please verify it and feel free to close this task if all is working as expected.
@jmads the access patch has been merged, it will be deployed across the fleet within the next 30 minutes.
Feel free to close this task once verified that all is working as expected.
Would it be possible to group the similar SSH errors where the only difference is the target hostname?
Apr 19 2022
@dr0ptp4kt could you please clarify if this access request (and the other related to the same project) is instead for the NDA group more than the WMF one? The NDA seems more approriate for non-staff and is the same used for example for research contractors.
As for accessing tools usually WMF and NDA are equivalent, so that shouldn't affect usability.
@eigyan the access request has been merged, it will be deployed within the next 30 minutes.
Please resolve this task once confirmed that it's all working as expected.
Pending the related T249873 at this point, to do all together.
Adding @BGerdemann for approval (contract side), please also provide a contract end date.
Adding @odimitrijevic for approval (analytics side).
Adding @KFrancis for confirming that there is still a valid NDA on file.
Apr 14 2022
Apr 13 2022
Perfect, thanks for clarifying.
let them run "puppet disable/enable" either directly or with a wrapper around it. (the one used by cumin?).
Will the work on this task also change the key wmfMasterDatacenter in siteinfo's ['query']['general']['wmf-config']?
If so please ping me when that will happen as I have to adjust spicerack accordingly.