User Details
- User Since
- Feb 10 2016, 11:25 AM (326 w, 6 d)
- Availability
- Available
- IRC Nick
- volans
- LDAP User
- Volans
- MediaWiki User
- RCoccioli (WMF) [ Global Accounts ]
Yesterday
I've a local patch that I'm testing to perform the validation of the whole dataset (manual + netbox). The preliminary results are below. I will have a look at the reported errors (that seems legit at first sight) and also at the warnings that might not be reported correctly anymore (some seems a bit too many).
I've merged the change and @jcrespo has run it manually. The backup is working fine. Resolving.
Thu, May 12
Could you try this?
We looked at the logs with John and Papaul during our last meeting and agreed that it took a long time for mdadm+mkfs to create the software raid partition and format it. Hence decided to just increase the current timeout in spicerack, I'll make the patch.
Wed, May 11
Mon, May 2
Ack, let me repurpose this one.
Sat, Apr 30
@Papaul is it normal that it's so slow to just create an empty partition?
We can surely increase the number or add some tweak in spicerack to make it a bit more dynamic. I'll take a look at it next week.
@fgiunchedi yes and no, duplicates within the operations/dns repository are currently catched, but duplication within the automatically-generated data or between the manual and the generated data are not.
What we could do is to refactor a bit zone validator to inject into the zonefiles the netbox generated data for each INCLUDE before parsing the file. That should allow to catch all issues, but would also mean that some wrong data in Netbox might make CI fail on a totally valid dns patch.
What are your thoughts?
Tue, Apr 26
Sure, but they could cause various unwanted issues in different contexes, like not matching the fingerprint in the known hosts file for SSH connections:
cumin1001 $ SSH_AUTH_SOCK=/run/keyholder/proxy.sock ssh root@sretest1001.eqiad.wmnet. The authenticity of host 'sretest1001.eqiad.wmnet. (2620:0:861:107:10:64:48:138)' can't be established. [...SNIP...]
That's why I would rather prefer to keep the data consistent in Netbox without the ending period, so that the behaviour is consistent everywhere. For the DNS -specific bits the period is automatically added always.
Thoughts?
The DNS Name field in Netbox is an FQDN, the same Netbox UI help message for the field is: Hostname or FQDN (not case-sensitive)
For this reason I think that this task should be closed as invalid, and instead we should move towards having consistent data in Netbox.
The DNS Name field is used in multiple places in different automation, from Homer to various Netbox scripts and is always considered to be an FQDN.
Fri, Apr 22
I've updated the task description according to T306654#7873125.
As for the puppet-merge on the puppetmasters, does the datacenter-ops have +2 on the operations/puppet repository on Gerrit?
Thu, Apr 21
Thanks for opening the task to discuss details. As the first feedback I've a primary question that is how you envision this new third way to configured the network devices to re-conciliate with the existing two?
Basically, if we do a change via this method, would then homer be out of sync? Or anything we plan to do with this method will be automatically included in homer runs and so would be a noop for homer based on the updated Netbox configuration and the new state of the network device?
@KSiebert thanks, it's all done. There was some confusion based on which email should the account be associated with.
@TheresNoTime I'm resolving this, feel free to reopen it in case you encounter any issue.
Sorry, I did overlooked the request, as your account is with an @wikimedia.org email account I've granted you the wmf group in LDAP and revoked the nda one as they can't cohexist.
But don't worry, if/when the contract will end you will be able to request back the nda one if needed.
Wed, Apr 20
I've +1ed the patch, @Ottomata feel free to merge whenever works for you.
Patch merged, resolving.
Patch merged, resolving.
Granted ldap/wmf to uid= samtar, revoked pre-existing ldap/nda one as they can't coexists on the same account.
Don't worry if/when the contract will be over you can re-get the nda one.
As clarified in the related task above, granted ldap/wmf to uid=maryyang.
LDAP wmf group granted for aassaf.
@Jdforrester-WMF I can check what's the difference in Gerrit, it depends on the repositories I guess.
Do they have an @wikimedia.org email account? As per https://wikitech.wikimedia.org/wiki/SRE/Clinic_Duty/Access_requests#WMF_group we usually grant the ldap/wmf group only to staff and contractors with an @wikimedia.org email account.
Granted ldap/nda group, confirmation of NDA on file is in T249873#7865953. Resolving.
For contractors we usually grant the ldap/nda group instead, at the practical level they are almost equivalent, so that should work too.
@TheresNoTime Would be ok for you to convert this request into requesting the ldap/nda group?
@jmads your kerberos account should still be valid, as far as I can tell. Please verify it and feel free to close this task if all is working as expected.
@jmads the access patch has been merged, it will be deployed across the fleet within the next 30 minutes.
Feel free to close this task once verified that all is working as expected.
Pending clarification from @dr0ptp4kt on the similar request T306437#7864599
Would it be possible to group the similar SSH errors where the only difference is the target hostname?
Tue, Apr 19
@dr0ptp4kt could you please clarify if this access request (and the other related to the same project) is instead for the NDA group more than the WMF one? The NDA seems more approriate for non-staff and is the same used for example for research contractors.
As for accessing tools usually WMF and NDA are equivalent, so that shouldn't affect usability.
@eigyan the access request has been merged, it will be deployed within the next 30 minutes.
Please resolve this task once confirmed that it's all working as expected.
Pending the related T249873 at this point, to do all together.
Adding @BGerdemann for approval (contract side), please also provide a contract end date.
Adding @odimitrijevic for approval (analytics side).
Adding @KFrancis for confirming that there is still a valid NDA on file.
Apr 14 2022
Apr 13 2022
Perfect, thanks for clarifying.
let them run "puppet disable/enable" either directly or with a wrapper around it. (the one used by cumin?).
Will the work on this task also change the key wmfMasterDatacenter in siteinfo's ['query']['general']['wmf-config']?
If so please ping me when that will happen as I have to adjust spicerack accordingly.
Apr 12 2022
Apr 11 2022
Apr 8 2022
Apr 7 2022
Apr 6 2022
If I may add my use case too, I would like to be able to restrict the access to the webproxies from the cumin hosts (cluster::management puppet role) and potentially other sensitive hosts. Ideally to an allow-list of URLs or something similar.
Apr 5 2022
With the above patch merged the problem should not happen anymore, if it does please re-open the task, I'm boldly resolving it for now.
Mar 31 2022
Mar 30 2022
Mar 29 2022
All done from my side, thanks a lot!
As John is out I took a stab at the implementation in https://gerrit.wikimedia.org/r/c/operations/puppet/+/773272, and decided to convert it to python, as I think gives us more flexibility.
I'd also like to use this host for a couple of Force PXE tests for T304434 if possible