User Details
- User Since
- Dec 15 2021, 9:19 PM (208 w, 6 d)
- Availability
- Available
- LDAP User
- Unknown
- MediaWiki User
- BKing (WMF) [ Global Accounts ]
Yesterday
Mon, Dec 15
Re: reload certificates API
Sure, feel free to create a dedicated ticket for this.
Checking the docs, it seems that reprepro doesn't support holding multiple versions of the same package in the repo:
I took another look at this today, as we (DPE SRE) are starting to build our first Docker images from the upstream package. Suddenly, I realize that I don't know enough about how our repo mirroring works. Let me explain:
Fri, Dec 12
Re: Blackbox autodiscovery, I had a brief chat with @tappof (Observability team SRE) about this in observability 's IRC room today and he said his team will take a look and get back to us.
For monitoring, we could enable Blackbox autodiscovery, similar to we already do Prometheus autodiscovery for metrics scrapes.
Thu, Dec 11
After some troubleshooting, we determined that the dse-k8s-codfw was missing some firewall rules and network plumbing (this change added the firewall rules, and we deployed ingress to dse-k8s via admin_ng shortly after . We now know how to deploy multi-DC with DC-specific endpoints. As such, I'm closing out this ticket.
Wed, Dec 10
Per pairing session earlier today, we were able to get the error queries to display . We can add this filter to the existing dashboard and clean up the or remove the broken panels.
Since T410635 has been declined, let's go ahead and decline this one a well, since they are basically the same thing. We can always revisit once we are on a different graph database.
Mon, Dec 8
@Jhancock.wm we've added the requested info above, plus the Puppet code so the hosts should be able to provision. If we missed anything, feel free to ping us here or on IRC.
@gmodena paired on the playbook today.
Thu, Dec 4
I've created this repo, which includes a playbook to restart wdqs.
Wed, Dec 3
Update after some more troubleshooting:
Tue, Dec 2
Thanks again for your patience. I've added a mini-essay on why I think it's safe to remove sre.puppet.sync-netbox-hiera prompts here , feel free to read through/disagree/offer feedback as time permits.
I'm in the middle of reimaging some hosts in T410681 , here's some quick notes on prompts that can be eliminated (in my opinion)
@Gehel , I'm interpreting your comment as
Sorry for all the reimage spam messages, I was using the wrong task ID when reimaging some host.
Mon, Dec 1
Per this Puppet change , we are now allowing up to 50% of physical RAM to be used as zRAM swap. Again quoting cdanis:
Mon, Nov 24
Looking at the DNS repo, it seems like we have a certain pattern for production cirrussearch:
Hey Moritz and Cathal,
Note to selves:
Thanks @cmooney ! I was thinking about this over the weekend. To me, the problem seems to be less about this specific issue and more that those of us outside I/F and DC Ops have a tough time keeping track of new cookbook features.
Thanks @MoritzMuehlenhoff ! Do I need to run the provisioning cookbook or make any other changes to put the host in UEFI mode? I know Cathal had to do some manual steps to change the SuperMicro hosts from legacy BIOS to UEFI on Friday.
@Jclark-ctr good catch. I didn't know about the Nokia bugs that prevent legacy BIOS reimage in eqiad rows C/D when I created this ticket.
Fri, Nov 21
Update: I was able to successfully deploy in CODFW, although I rolled it back because it was setting off alerts.
The plugins have been deployed to production CirrusSearch. Closing...
Thu, Nov 20
Closing per @Anton.Kokh request. Feel free to ping DPE SRE in a follow-up ticket if necessary.
@Anton.Kokh we have added the requested endpoint. Please test this out and feel free to re-open this ticket if it is not working as expected.
@Anton.Kokh we have added the requested endpoint. Please test this out and feel free to re-open this ticket if it is not working as expected.
@Anton.Kokh we have added the requested endpoint. Please test this out and feel free to re-open this ticket if it is not working as expected.
Wed, Nov 19
After today's Puppet change , we should no longer be getting these alerts. Closing...
Tue, Nov 18
@Jclark-ctr I added the hosts to Puppet per https://wikitech.wikimedia.org/wiki/Server_Lifecycle#Planned_-%3E_Active , assigning back over to you. Feel free to ping me in IRC if I missed anything!