User Details
- User Since
- Oct 15 2019, 4:02 PM (239 w, 6 d)
- Availability
- Available
- IRC Nick
- rzl
- LDAP User
- RLazarus
- MediaWiki User
- RLazarus (WMF) [ Global Accounts ]
Wed, May 1
Wed, Apr 24
Thanks. At present the controller monitors all namespaces, but ignores pods other than in mw-script. So if I were estimating memory usage I'd base it on the total number of pod events in the cluster, not just in the namespace.
Apr 17 2024
That sounds reasonable! Note for the future that helm diff has a --suppress-output-line-regex which does exactly what you'd like it to do, but it's not available in the version we're currently running.
Apr 16 2024
If we're really worried about that race condition, is it plausible to do this?
Apr 4 2024
Clinic duty SRE here -- I/F, can you start investigating this at the MTA end? Triaging this to High in case it's widespread, but feel free to decrease if it turns out it's not.
rzl@mwmaint1002:~$ ldapsearch -x cn=wmf | grep ospingou member: uid=ospingou,ou=people,dc=wikimedia,dc=org
Apr 3 2024
@AndyRussG Welcome back!
Apr 1 2024
Yep, we're following closely -- but we don't use Debian unstable, so we're not directly affected. Thanks for checking!
Clinic duty SRE here, thanks @karapayneWMDE for the ticket. I merged https://gerrit.wikimedia.org/r/1015995 (thanks @Dzahn!) and followed up with
Mar 27 2024
Mar 22 2024
Curious: As @Clement_Goubert and I discussed, both the directory (via puppet file) and the database file (via puppet exec of imagecatalog init) have the right user, imagecatalog. There's nothing in Puppet (like a recurse) to ensure ownership on the database file, but it still ought to come out correct, as far as I can tell. Claime reports the file was owned by mwbuilder, which runs the release tools, but I think imagecatalog init should still have run first as the imagecatalog user.
Mar 11 2024
FYI: I redid a dry run and live test for 01-stop-maintenance.py after https://gerrit.wikimedia.org/r/1008583 and it's good to go.
This is good to go for the March 2024 switchover, so removing it as a subtask.
Mar 7 2024
I don't see anything obviously hardware-broken in logs. I notice it was just repooled yesterday after maintenance for T352010, but nothing jumps out as an obvious cause. Over to the DBAs from here, enjoy. :)
Mar 5 2024
Thanks @bd808!
That sounds like a perfect solution, except that I'm not in Trusted-Contributors. Which is admittedly pretty funny.
Confirming @jasmine_ is an intern on my team. If she needs any vouching, please consider her vouched! :)
Mar 4 2024
Mar 2 2024
Mar 1 2024
Will do, thanks for the pointer.
Feb 29 2024
Feb 28 2024
Feb 23 2024
Feb 22 2024
Restarted mailman3 at 00:43, icinga alerts are cleared, and the graph in T358020#9565952 is trending down again. Thanks @JJMC89 for the report and thanks @Legoktm for the ping.
Feb 21 2024
Feb 14 2024
the intention was probably for this to match something a bit more restrictive (e.g., matching ^/wiki(/.*)?$)
Sorry yeah, I was using the term broadly. The goal is to edit the Apache config, but that hieradata file is how you'd do it. :)
Feb 13 2024
Hi from Service Ops SRE!
Feb 12 2024
Surfacing @JMeybohm's reasonable concern from https://gerrit.wikimedia.org/r/c/988851/comments/3827b6cd_15427748:
Jan 25 2024
This week's clinic duty SRE is @Arnoldokoth.
Jan 24 2024
Jan 23 2024
Jan 9 2024
Dec 19 2023
Okay, let me know if https://gerrit.wikimedia.org/r/983963 plus the most recent iteration of https://gitlab.wikimedia.org/repos/sre/k8s-controller-sidecars/-/merge_requests/1 is what you had in mind...
Dec 18 2023
Oh, I misunderstood what you meant by "enable the controller on a per namespace level" above! I thought deploying one instance per namespace was what you had in mind.
Dec 14 2023
Yeah, as foretold:
Dec 11 2023
Super helpful explanation, thank you! https://gerrit.wikimedia.org/r/981703 should do the above, and https://gerrit.wikimedia.org/r/981704 adds the binding for the mw-script namespace. I'll deploy those like wikitech:Kubernetes/Add_a_new_service#Deploy_changes_to_helmfile.d/admin_ng and that will let me finish experimenting with the helm charts for both the sidecar controller and the actual jobs.
Dec 7 2023
@JMeybohm Can you help with an RBAC issue?
Oct 26 2023
Yep, sounds like a similar fit.
Oct 16 2023
Doh. Okay, that's a good reason not to go that route. :) I'll give the "primary container" label a try, thanks.
Oct 14 2023
Both good points, thanks.
Oct 6 2023
Oct 5 2023
Sep 21 2023
Done:
- Added you to the wmf LDAP group.
- Added you to the WMF-NDA Phabricator project.
- Created your shell user ahoelzl and added it to the analytics-privatedata-users POSIX group.
- Created your Kerberos principal.
It seems like this fell through the cracks between last week's SRE clinic duty (mine) and this week's. Let me finish it up for you, thanks for your patience.
Sep 20 2023
This sounds right to me -- thanks @elukey for getting it rolling. Early on, we had talked about autogenerating links for different calendar quarters and adding them to the text panel on top, but my recollection is we decided to spend that energy on Pyrra instead.
Sep 18 2023
Sep 16 2023
I don't have edit access to acl*security.
@thcipriani Sorry for the back-and-forth, but just because it isn't 100% explicit from reading this task -- did you want @Mabualruz to get deployer training before being added to the group? Or do we have your approval to add him, so that he can do the training hands-on?
Sep 15 2023
We have some plans for SLO-based alerting in the pipeline, but nothing implemented yet.
Sep 14 2023
Sep 12 2023
No, I tagged it private when we asked for PII, so that it would already be private when that stuff was posted. Since it never appeared, I'm fine with opening it up.
Sep 11 2023
Hi @Ahoelzl, welcome to the Foundation! SRE here, I'll be able to set you up with production access.
Hi @joanna_borun -- does this need Infrastructure Foundations approval?
Sep 7 2023
Aug 21 2023
Aug 18 2023
Only two blockers were raised at the August 7 SRE meeting:
Aug 2 2023
Jul 25 2023
Those numbers don't immediately raise alarm bells for me -- "storage" doesn't mean anything persistent, only ephemeral data that can disappear when the script exits, right? As long as that's the case (and assuming you're using ~1 CPU), you should be fine. I'm tagging in @akosiaris to confirm the resource request is sensible.