User Details
- User Since
- Jul 26 2022, 2:11 PM (27 w, 5 d)
- Availability
- Available
- LDAP User
- Clément Goubert
- MediaWiki User
- CGoubert-WMF [ Global Accounts ]
Fri, Feb 3
Thank you very much :)
I've rebased and implemented one of @Volans recommandation on the CR that had already been created by @RLazarus
Would love to have your eyeballs on it Data-Persistence
Thanks @jbond ! It'll update the file in max 30 minutes, that sounds good enough for me since I doubt a lot of pcc runs would be done right after the switchover. If need be we can still force a puppet-run on the compiler hosts.
I don't think this should be considered a blocker for T327920: March 2023 Datacenter Switchover.
I don't think this should be considered a blocker for T327920: March 2023 Datacenter Switchover
However, we should address it for mw-on-k8s and releases.
Directly related work by @Joe that could use a couple eyeballs https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/886038
Is this work still ongoing? If not, would it be possible to update the doc before T327920: March 2023 Datacenter Switchover ?
We are now correctly sending, ingesting and storing slowlogs in ECS format. Next step, dashboards.
Thu, Feb 2
@RLazarus You were working on rewriting the app server warmup script in Python, I suppose this is directly related.
Debugging steps:
- Launch helmfile -e staging apply
- From another deploy1002 shell
kube-env zotero staging kubectl get pods # Get the most recent pod kubectl logs <pod_id> zotero-staging
Is this still relevant, does it need to be finished for T327920: March 2023 Datacenter Switchover, or can it be closed?
Wed, Feb 1
2 more mw hosts affected:
cgoubert@cumin1001:~$ sudo cumin 'mw23[26,29,32].codfw.wmnet' 'ipmi-sel | tail -n2' 3 hosts will be targeted: mw[2326,2329,2332].codfw.wmnet OK to proceed on 3 hosts? Enter the number of affected hosts to confirm or "q" to quit: 3 ===== NODE GROUP ===== (1) mw2329.codfw.wmnet ----- OUTPUT of 'ipmi-sel | tail -n2' ----- 25 | Jan-30-2023 | 21:44:31 | Status | Power Supply | Power Supply input lost (AC/DC) 26 | Jan-30-2023 | 21:45:55 | PS Redundancy | Power Supply | Fully Redundant ===== NODE GROUP ===== (1) mw2332.codfw.wmnet ----- OUTPUT of 'ipmi-sel | tail -n2' ----- 18 | Jan-30-2023 | 21:44:25 | Status | Power Supply | Power Supply input lost (AC/DC) 19 | Jan-30-2023 | 21:44:30 | PS Redundancy | Power Supply | Redundancy Lost ===== NODE GROUP ===== (1) mw2326.codfw.wmnet ----- OUTPUT of 'ipmi-sel | tail -n2' ----- 13 | Sep-28-2022 | 15:05:49 | Status | Power Supply | Power Supply input lost (AC/DC) 14 | Jan-30-2023 | 17:15:41 | Status | Power Supply | Power Supply input lost (AC/DC) ================
Tagging @akosiaris and @Eevans for awareness as they are on-call this week too.
Mon, Jan 30
Handing off to this week's Clinic Duty SRE.
@herron you should just have to merge the CR and create the kerberos principal.
Thu, Jan 26
- Approval from @Ottomata or @odimitrijevic as group approvers
- Approval from @JanWMF as manager
- Out of band key verification
Wed, Jan 25
The new key has been pushed, please allow for 30 minutes from this post for it to be deployed. Feel free to reopen the task if you experience any issue.
Got out of band confirmation through slack, proceeding.
FYI this created warnings in cross-validate-accounts, CR incoming.
Hi @RGaines_WMF
Sorry this did not get picked up earlier.
According to https://wikitech.wikimedia.org/wiki/SRE/Clinic_Duty/Access_requests#Google_&_Bing_search_console_access SRE does not own access requests for GSC, Core Experiences does.
@SCherukuwada may be able to help if this is still relevant?
Tue, Jan 24
@Samwalton9 your access to the relevant groups has been granted. Please wait 30m (as of this comment) before trying it out as the access propagates across the fleet. You should also have received an email regarding Kerberos, you can follow the instructions on there to set your credentials. If you didn't, please check your spam folder just in case.
Retention updated for mediawiki.httpd.accesslog in codfw
Ack. CR updated for kerberos access.
Hi @Samwalton9,
I need :
- Approval from @Ottomata or @odimitrijevic for the privilege extension to shell access, as group approvers
- Approval from @DannyH for the same, as manager
Being bold and closing this task since it's had no meaningful update in 7 years. Feel free to reopen if needed.
Mon, Jan 23
@taavi Access request merged, you should have your access around 30 minutes from now when puppet has run. Resolving, don't hesitate to reopen if needed.
@taavi Patch ready, assuming you don't need kerberos access. Here are the Data Access User Responsibilities
@Muhammad_Yasser_Jazirahly_WMDE your access to the relevant groups has been granted. Please wait 30m (as of this comment) before trying it out as the access propagates across the fleet. You should also have received an email regarding Kerberos, you can follow the instructions on there to set your credentials. If you didn't, please check your spam folder just in case.
You should have received an email regarding Kerberos, you can follow the instructions on there to set your credentials. If you didn't, please check your spam folder just in case.
@MShilova_WMF your access to the wmf group has been granted. Please wait 30m (as of this comment) before trying it out as the access propagates across the fleet. I'll resolve this task, feel free to reopen if you meet issues.
@Ollie.Shotton_WMDE your access to the relevant groups has been granted. Please wait 30m (as of this comment) before trying it out as the access propagates across the fleet. I'll resolve this task, feel free to reopen if you meet issues.
The link in the task description 404s. Being bold and closing as Invalid, feel free to reopen with up to date information if needed.
- Merge CR
- Grant LDAP group access