Page MenuHomePhabricator

Raine (Raine Souček)
Site Reliability Engineer - ServiceOps

Today

  • No visible events.

Tomorrow

  • No visible events.

Thursday

  • No visible events.

User Details

User Since
Mar 16 2023, 2:18 PM (143 w, 5 d)
Availability
Available
IRC Nick
Raine
LDAP User
Kamila Součková
MediaWiki User
KSoučková-WMF [ Global Accounts ]

Recent Activity

Mon, Dec 8

Raine added a comment to T411816: cannot add a FIDO-backed ssh key to Bitu.
raine@entropy ~ > echo -n 'AAAAInNrLWVjZHNhLXNoYTItbmlzdHAyNTZAb3BlbnNzaC5jb20AAAAIbmlzdHAyNTYAAABBBHxn2svCja9zUl9QHx2Hy2hvOqI1coNF+rAqXx8aktK6dxEshqE0Grx7x0vYNGBoCy83/pDlM7dwjEQTO37JJVAAAAAEc3NoOg==' | wc -c 
172
raine@entropy ~ > echo -n 'raine@yknano-A' | wc -c
14
Mon, Dec 8, 2:18 PM · Infrastructure-Foundations, Bitu

Fri, Dec 5

Raine added a comment to T411816: cannot add a FIDO-backed ssh key to Bitu.

I just tested it again and got 500 Internal Server Error - Your request caused an error on the server. again. I am also not able to git pull from gerrit, so I don't think the key was deployed despite the error.

Fri, Dec 5, 5:49 PM · Infrastructure-Foundations, Bitu
Raine reopened T411255: Monitor hCaptcha (upstream) status, a subtask of T410626: WE6.2.6: ☂️ hcaptcha-proxy Production Readiness Review, as Open.
Fri, Dec 5, 12:36 PM · serviceops
Raine reopened T411255: Monitor hCaptcha (upstream) status as "Open".

The extension does not distinguish between hCaptcha being unreachable due to the upstream being gone vs our proxy being gone. Therefore, I think that having blackbox monitoring of hcaptcha.com that does not go through our proxy would still have additional value. @jijiki let me know if I'm wrong :-)

Fri, Dec 5, 12:35 PM · serviceops

Thu, Dec 4

Raine updated the task description for T411131: hcaptcha proxy: update wikitech page.
Thu, Dec 4, 10:59 PM · WE4.2 Bot detection, Documentation, serviceops
Raine changed the status of T411780: hcaptcha-proxy: create logstash dashboard, a subtask of T410626: WE6.2.6: ☂️ hcaptcha-proxy Production Readiness Review, from Open to In Progress.
Thu, Dec 4, 10:59 PM · serviceops
Raine changed the status of T411780: hcaptcha-proxy: create logstash dashboard from Open to In Progress.
Thu, Dec 4, 10:59 PM · serviceops
Raine closed T411255: Monitor hCaptcha (upstream) status, a subtask of T410626: WE6.2.6: ☂️ hcaptcha-proxy Production Readiness Review, as Declined.
Thu, Dec 4, 10:50 PM · serviceops
Raine closed T411255: Monitor hCaptcha (upstream) status as Declined.

The monitoring created in T404204 is sufficient -- see https://grafana-rw.wikimedia.org/d/441b2def-52e9-49d6-acad-91f5bb748989/hcaptcha-reverse-proxy-proxoid?viewPanel=panel-36 . Further work is not needed.

Thu, Dec 4, 10:50 PM · serviceops
Raine added a comment to T411404: Update SSH key for kamila.

I am leaving this open as a reminder to delete the old key, but I'm currently unable to do that (blocked by T411816). If it's in the way, feel free to close it.

Thu, Dec 4, 9:08 PM · SRE-Unowned
Raine created T411816: cannot add a FIDO-backed ssh key to Bitu.
Thu, Dec 4, 7:59 PM · Infrastructure-Foundations, Bitu
Raine changed the status of T411254: Improve hcaptcha-proxy Grafana dashboard, a subtask of T410626: WE6.2.6: ☂️ hcaptcha-proxy Production Readiness Review, from Open to In Progress.
Thu, Dec 4, 6:10 PM · serviceops
Raine changed the status of T411254: Improve hcaptcha-proxy Grafana dashboard from Open to In Progress.
Thu, Dec 4, 6:10 PM · serviceops
Raine renamed T411255: Monitor hCaptcha (upstream) status from Monitor hCaptcha status to Monitor hCaptcha (upstream) status.
Thu, Dec 4, 3:00 PM · serviceops
Raine updated the task description for T411131: hcaptcha proxy: update wikitech page.
Thu, Dec 4, 2:04 PM · WE4.2 Bot detection, Documentation, serviceops
Raine created T411782: hcaptcha-proxy: consider unit tests, integration tests.
Thu, Dec 4, 1:57 PM · serviceops
Raine created T411780: hcaptcha-proxy: create logstash dashboard.
Thu, Dec 4, 1:53 PM · serviceops
Raine updated the task description for T411131: hcaptcha proxy: update wikitech page.
Thu, Dec 4, 1:14 PM · WE4.2 Bot detection, Documentation, serviceops

Mon, Dec 1

Raine closed T411148: hcaptcha-proxy: update service catalog as Declined.

Not needed with the current setup.

Mon, Dec 1, 6:56 PM · serviceops
Raine closed T411148: hcaptcha-proxy: update service catalog, a subtask of T410626: WE6.2.6: ☂️ hcaptcha-proxy Production Readiness Review, as Declined.
Mon, Dec 1, 6:56 PM · serviceops
Raine created T411404: Update SSH key for kamila.
Mon, Dec 1, 6:09 PM · SRE-Unowned
Raine updated the task description for T411256: Draft hCaptcha SLOs, document SLIs.
Mon, Dec 1, 5:01 PM · serviceops
Raine triaged T411256: Draft hCaptcha SLOs, document SLIs as Medium priority.
Mon, Dec 1, 5:00 PM · serviceops

Fri, Nov 28

Raine added a comment to T327663: Create a visual representation of where each service is active from, any given time.

Is anything still needed beyond the functionality in sudo cookbook -d sre.discovery.datacenter status all? That provides the following table:

Service                       Type           eqiad     codfw
=================================================================
apertium                      Active/Active  pooled    pooled    
api-gateway                   Active/Active  pooled    pooled    
apt                           Active/Passive pooled              
apus                          Active/Active  pooled    pooled    
citoid                        Active/Active  pooled    pooled    
config-master                 Active/Active  pooled    pooled    
cxserver                      Active/Active  pooled    pooled    
device-analytics              Active/Active  pooled    pooled    
docker-registry               Active/Passive           pooled    
echostore                     Active/Active  pooled    pooled    
eventgate-analytics           Active/Active  pooled    pooled    
[...]
Fri, Nov 28, 5:51 PM · Patch-For-Review, serviceops, observability
Raine updated the task description for T411256: Draft hCaptcha SLOs, document SLIs.
Fri, Nov 28, 12:44 PM · serviceops
Raine created T411256: Draft hCaptcha SLOs, document SLIs.
Fri, Nov 28, 12:37 PM · serviceops
Raine created T411255: Monitor hCaptcha (upstream) status.
Fri, Nov 28, 12:29 PM · serviceops
Raine updated the task description for T411251: Improve hcaptcha-proxy alerting.
Fri, Nov 28, 12:27 PM · serviceops
Raine created T411254: Improve hcaptcha-proxy Grafana dashboard.
Fri, Nov 28, 12:26 PM · serviceops
Raine created T411251: Improve hcaptcha-proxy alerting.
Fri, Nov 28, 11:57 AM · serviceops
Raine triaged T411191: hcaptcha-proxy health checks should also depool sites if their upstream is unreachable as Low priority.

But there are more considerations than this. Given that this setup is not like a typical LB (or Liberica), we don't have the concept of depool thresholds and such, and therefore we have to be careful in case we end up depooling everything (what if the upstream service is temporarily down and we end up depooling both hosts in a site?). And while there is that fallback mechanism to the old system, this is something to keep in mind.

Fri, Nov 28, 11:53 AM · WE4.2 Bot detection, Traffic, serviceops
Raine removed a subtask for T400263: ☂️ [FY2025-26][Hypothesis] WE6.2.1 Production Readiness Checklist: T411141: hcaptcha proxy: bump connection limits + stress test.
Fri, Nov 28, 11:43 AM · serviceops
Raine added a subtask for T410626: WE6.2.6: ☂️ hcaptcha-proxy Production Readiness Review: T411141: hcaptcha proxy: bump connection limits + stress test.
Fri, Nov 28, 11:43 AM · serviceops
Raine edited parent tasks for T411141: hcaptcha proxy: bump connection limits + stress test, added: T410626: WE6.2.6: ☂️ hcaptcha-proxy Production Readiness Review; removed: T400263: ☂️ [FY2025-26][Hypothesis] WE6.2.1 Production Readiness Checklist.
Fri, Nov 28, 11:43 AM · serviceops

Thu, Nov 27

Raine created T411191: hcaptcha-proxy health checks should also depool sites if their upstream is unreachable.
Thu, Nov 27, 2:12 PM · WE4.2 Bot detection, Traffic, serviceops
Raine updated the task description for T411115: hcaptcha extension, proxy: Define the backoff and retry strategies.
Thu, Nov 27, 1:29 PM · WE4.2 Bot detection, serviceops

Wed, Nov 26

Raine created T411148: hcaptcha-proxy: update service catalog.
Wed, Nov 26, 10:18 PM · serviceops
Raine created T411141: hcaptcha proxy: bump connection limits + stress test.
Wed, Nov 26, 9:34 PM · serviceops
Raine claimed T411131: hcaptcha proxy: update wikitech page.
Wed, Nov 26, 9:11 PM · WE4.2 Bot detection, Documentation, serviceops
Raine created T411131: hcaptcha proxy: update wikitech page.
Wed, Nov 26, 8:35 PM · WE4.2 Bot detection, Documentation, serviceops
Raine updated the task description for T411097: Deprecate low-traffic proxoid service and O:hcaptcha_proxy for the older hcaptcha proxy setup.
Wed, Nov 26, 6:48 PM · SRE, Traffic
Raine updated the task description for T411097: Deprecate low-traffic proxoid service and O:hcaptcha_proxy for the older hcaptcha proxy setup.
Wed, Nov 26, 6:37 PM · SRE, Traffic
Raine created T411115: hcaptcha extension, proxy: Define the backoff and retry strategies.
Wed, Nov 26, 6:11 PM · WE4.2 Bot detection, serviceops
Raine moved T348985: Automate the "make sure email works" step of the DC switchover from 🌺🌸🧹Switchover to 🥋Good First Task on the serviceops board.
Wed, Nov 26, 5:22 PM · serviceops, Datacenter-Switchover

Tue, Nov 25

Raine added a project to T348985: Automate the "make sure email works" step of the DC switchover: good first task.

Thank you for tagging this task with good first task for Wikimedia newcomers!

Tue, Nov 25, 7:22 PM · serviceops, Datacenter-Switchover
Raine placed T348990: Simplify switchover of deployment server up for grabs.
Tue, Nov 25, 7:20 PM · serviceops, Datacenter-Switchover

Nov 13 2025

Raine added a comment to T408004: hw troubleshooting: host unresponsive for wikikube-worker2203.codfw.wmnet.

Thanks @Jhancock.wm , looks good!

Nov 13 2025, 4:58 PM · SRE, serviceops, ops-codfw, DC-Ops

Nov 11 2025

Raine closed T409845: Proxy-side latency metrics as Invalid.

Never mind, the mtail-provided metrics appear to be sufficient.

Nov 11 2025, 5:04 PM · WE4.2 Bot detection (WE4.2 hCaptcha account creation trial)
Raine added a comment to T409845: Proxy-side latency metrics.

Due to the sensitive nature of the proxy configuration, I am leaning towards the option that requires less configuration within nginx, which happens to be nginxlog-exporter. As the only thing it can see is the logs, it is easier to verify that no sensitive user information is being leaked anywhere.

Nov 11 2025, 4:28 PM · WE4.2 Bot detection (WE4.2 hCaptcha account creation trial)
Raine changed the status of T409845: Proxy-side latency metrics from Open to In Progress.
Nov 11 2025, 4:25 PM · WE4.2 Bot detection (WE4.2 hCaptcha account creation trial)
Raine moved T409845: Proxy-side latency metrics from Backlog to In progress on the WE4.2 Bot detection (WE4.2 hCaptcha account creation trial) board.
Nov 11 2025, 4:25 PM · WE4.2 Bot detection (WE4.2 hCaptcha account creation trial)
Raine created T409845: Proxy-side latency metrics.
Nov 11 2025, 4:02 PM · WE4.2 Bot detection (WE4.2 hCaptcha account creation trial)

Nov 10 2025

Raine added a comment to T407094: Requesting access to analytics-privatedata-users for SKaram-WMF.

It looks like the dot < . > at the end of the public key is missing in the patch. The dot is actually part of the key.

There is a space before the dot in the key as pasted in the task description. The key encoding does not allow spaces, so if the dot is part of the key, then the key as pasted cannot be correct. Can you please paste the correct key?

Nov 10 2025, 12:38 PM · SRE, SRE-Access-Requests

Nov 4 2025

Raine changed the status of T388969: MW deployments shouldn't need a hard-coded kubernetesVersion, a subtask of T388390: Ensure the correct helm version is used for each cluster, from Open to In Progress.
Nov 4 2025, 9:38 PM · Patch-For-Review, Data-Platform-SRE, Kubernetes, Prod-Kubernetes, serviceops
Raine renamed T388969: MW deployments shouldn't need a hard-coded kubernetesVersion from MW deployments shouldn't need a hard-coded kubernetesVersion to MW deployments shouldn't need a hard-coded kubernetesVersion.
Nov 4 2025, 9:37 PM · Patch-For-Review, Kubernetes, Prod-Kubernetes, serviceops

Nov 3 2025

Raine added a comment to T408004: hw troubleshooting: host unresponsive for wikikube-worker2203.codfw.wmnet.

@Jhancock.wm great, thanks for the update!

Nov 3 2025, 1:40 PM · SRE, serviceops, ops-codfw, DC-Ops

Oct 28 2025

Raine closed T407955: Requesting access to ops-limited for dpogorzelski as Resolved.

Done, ping me in case of trouble :-)

Oct 28 2025, 2:59 PM · SRE, SRE-Access-Requests
Raine updated the task description for T407955: Requesting access to ops-limited for dpogorzelski.
Oct 28 2025, 2:29 PM · SRE, SRE-Access-Requests

Oct 27 2025

Raine closed T407228: Requesting access to "analytics-admins" and "deployment" groups for a-pizzata as Resolved.

Done, let me know if something isn't working :-)

Oct 27 2025, 12:42 PM · SRE, SRE-Access-Requests

Oct 24 2025

Raine moved T407228: Requesting access to "analytics-admins" and "deployment" groups for a-pizzata from Manager/NDA Approval/Confirmation to Patch in Review on the SRE-Access-Requests board.
Oct 24 2025, 4:50 PM · SRE, SRE-Access-Requests
Raine moved T408164: Requesting access to Superset, Turnilo, Spark, Presto, Hive, Hadoop, Jupyter for Jmoore111 from Awaiting User Input to Ready To Go on the SRE-Access-Requests board.
Oct 24 2025, 4:50 PM · Essential-Work, Data-Platform-SRE (2025.10.17 - 2025.11.07), SRE, SRE-Access-Requests
Raine updated the task description for T407228: Requesting access to "analytics-admins" and "deployment" groups for a-pizzata.
Oct 24 2025, 12:55 PM · SRE, SRE-Access-Requests
Raine added a comment to T408164: Requesting access to Superset, Turnilo, Spark, Presto, Hive, Hadoop, Jupyter for Jmoore111.

@JMoore-WMF has supplied me with an SSH key via our authenticated WMF Slack org, but said this:

I just realized that the public key i included in the ticket was the wrong one- accidentally included the one for Wikimedia cloud SSH access
will update the ticket accordingly

I had previously run the cross-validation against this key as shown in T408164#11305495 so I am confused as to why this didn't get picked up.
It could be one of either:

  1. The cross-validate script didn't work correctly.
  2. The key isn't actually used in WMCS, although perhaps it is/was intended to be used there.
  3. I executed the check incorrectly.
Oct 24 2025, 12:46 PM · Essential-Work, Data-Platform-SRE (2025.10.17 - 2025.11.07), SRE, SRE-Access-Requests
Raine closed T407605: Requesting access to analytics-privatedata-users for vicaplet-wmde as Resolved.

Done, @Virginie.caplet let me know in case something doesn't work :-)

Oct 24 2025, 12:41 PM · SRE-Access-Requests, SRE
Raine reassigned T407228: Requesting access to "analytics-admins" and "deployment" groups for a-pizzata from Ahoelzl to BTullis.

@BTullis can you please approve the analytics-admins access? Thanks!

Oct 24 2025, 12:21 PM · SRE, SRE-Access-Requests
Raine updated the task description for T407228: Requesting access to "analytics-admins" and "deployment" groups for a-pizzata.
Oct 24 2025, 12:21 PM · SRE, SRE-Access-Requests
Raine moved T408164: Requesting access to Superset, Turnilo, Spark, Presto, Hive, Hadoop, Jupyter for Jmoore111 from Untriaged to Awaiting User Input on the SRE-Access-Requests board.
Oct 24 2025, 12:16 PM · Essential-Work, Data-Platform-SRE (2025.10.17 - 2025.11.07), SRE, SRE-Access-Requests

Oct 23 2025

Raine claimed T408008: Requesting access to phabricator-admin for urbanecm.

Yes, I approve this.

Oct 23 2025, 7:13 PM · SRE, SRE-Access-Requests
Raine added a comment to T408004: hw troubleshooting: host unresponsive for wikikube-worker2203.codfw.wmnet.

Thanks @Jhancock.wm, appreciated! No worries, this is really not urgent.

Oct 23 2025, 6:40 PM · SRE, serviceops, ops-codfw, DC-Ops
Raine moved T408008: Requesting access to phabricator-admin for urbanecm from Manager/NDA Approval/Confirmation to Patch in Review on the SRE-Access-Requests board.
Oct 23 2025, 5:09 PM · SRE, SRE-Access-Requests
Raine updated the task description for T408008: Requesting access to phabricator-admin for urbanecm.
Oct 23 2025, 5:03 PM · SRE, SRE-Access-Requests
Raine moved T408008: Requesting access to phabricator-admin for urbanecm from Untriaged to Manager/NDA Approval/Confirmation on the SRE-Access-Requests board.
Oct 23 2025, 2:51 PM · SRE, SRE-Access-Requests
Raine assigned T408008: Requesting access to phabricator-admin for urbanecm to DMburugu.

@DMburugu can you please approve this? Thanks!

Oct 23 2025, 2:48 PM · SRE, SRE-Access-Requests
Raine updated the task description for T408008: Requesting access to phabricator-admin for urbanecm.
Oct 23 2025, 2:48 PM · SRE, SRE-Access-Requests
Raine moved T406590: Requesting access to 'restricted' for neslihanturan from Untriaged to Awaiting User Input on the SRE-Access-Requests board.
Oct 23 2025, 2:41 PM · SRE-Access-Requests, SRE
Raine moved T407094: Requesting access to analytics-privatedata-users for SKaram-WMF from Ready To Go to Patch in Review on the SRE-Access-Requests board.
Oct 23 2025, 1:07 PM · SRE, SRE-Access-Requests
Raine moved T407094: Requesting access to analytics-privatedata-users for SKaram-WMF from Awaiting User Input to Ready To Go on the SRE-Access-Requests board.
Oct 23 2025, 12:59 PM · SRE, SRE-Access-Requests
Raine updated the task description for T407094: Requesting access to analytics-privatedata-users for SKaram-WMF.
Oct 23 2025, 12:58 PM · SRE, SRE-Access-Requests
Raine added a comment to T407094: Requesting access to analytics-privatedata-users for SKaram-WMF.

@SKaram-WMF just to confirm, do you need SSH access or only dashboards and such? Thanks!

Oct 23 2025, 12:57 PM · SRE, SRE-Access-Requests
Raine added a comment to T407094: Requesting access to analytics-privatedata-users for SKaram-WMF.

@SKaram-WMF just to confirm, do you need SSH access or only dashboards and such? Thanks!

Oct 23 2025, 12:55 PM · SRE, SRE-Access-Requests
Raine moved T407094: Requesting access to analytics-privatedata-users for SKaram-WMF from Untriaged to Awaiting User Input on the SRE-Access-Requests board.
Oct 23 2025, 12:53 PM · SRE, SRE-Access-Requests
Raine moved T407605: Requesting access to analytics-privatedata-users for vicaplet-wmde from Untriaged to Patch in Review on the SRE-Access-Requests board.
Oct 23 2025, 12:53 PM · SRE-Access-Requests, SRE
Raine updated the task description for T407605: Requesting access to analytics-privatedata-users for vicaplet-wmde.
Oct 23 2025, 12:33 PM · SRE-Access-Requests, SRE
Raine added a comment to T407605: Requesting access to analytics-privatedata-users for vicaplet-wmde.

Noting that for analytics-privatedata-users, "Explicit approval is not required for WMF or WMDE Staff." (T381824, T370424).

Oct 23 2025, 12:33 PM · SRE-Access-Requests, SRE
Raine added a comment to T407605: Requesting access to analytics-privatedata-users for vicaplet-wmde.

@Virginie.caplet I assume your developer account username is vicaplet-wmde, correcting it.

Oct 23 2025, 12:22 PM · SRE-Access-Requests, SRE
Raine renamed T407605: Requesting access to analytics-privatedata-users for vicaplet-wmde from Requesting access to analytics-privatedata-users for vicaplet to Requesting access to analytics-privatedata-users for vicaplet-wmde.
Oct 23 2025, 12:22 PM · SRE-Access-Requests, SRE
Raine closed T407917: replace ssh keys with yubikey-backed key for Daniel Z as Resolved.
Oct 23 2025, 11:03 AM · SRE, SRE-Access-Requests

Oct 22 2025

Raine updated the task description for T407955: Requesting access to ops-limited for dpogorzelski.
Oct 22 2025, 5:47 PM · SRE, SRE-Access-Requests
Raine added a comment to T407955: Requesting access to ops-limited for dpogorzelski.

confirmed key oob

Oct 22 2025, 5:42 PM · SRE, SRE-Access-Requests
Raine moved T407228: Requesting access to "analytics-admins" and "deployment" groups for a-pizzata from Untriaged to Manager/NDA Approval/Confirmation on the SRE-Access-Requests board.
Oct 22 2025, 5:37 PM · SRE, SRE-Access-Requests
Raine moved T407917: replace ssh keys with yubikey-backed key for Daniel Z from Untriaged to Ready To Go on the SRE-Access-Requests board.
Oct 22 2025, 5:28 PM · SRE, SRE-Access-Requests
Raine moved T406243: Requesting access to deployment for VolkerE from Untriaged to Awaiting User Input on the SRE-Access-Requests board.
Oct 22 2025, 5:27 PM · SRE, SRE-Access-Requests
Raine moved T406592: Requesting access to 'deployment' for seanleong-wmde from Untriaged to Manager/NDA Approval/Confirmation on the SRE-Access-Requests board.
Oct 22 2025, 5:27 PM · SRE, SRE-Access-Requests
Raine moved T407955: Requesting access to ops-limited for dpogorzelski from Untriaged to Manager/NDA Approval/Confirmation on the SRE-Access-Requests board.
Oct 22 2025, 5:27 PM · SRE, SRE-Access-Requests
Raine assigned T406592: Requesting access to 'deployment' for seanleong-wmde to KFrancis.
Oct 22 2025, 5:26 PM · SRE, SRE-Access-Requests
Raine reassigned T407955: Requesting access to ops-limited for dpogorzelski from Raine to mark.
Oct 22 2025, 5:25 PM · SRE, SRE-Access-Requests
Raine updated subscribers of T407955: Requesting access to ops-limited for dpogorzelski.

@mark can you please approve this from the SRE side? Thanks!

Oct 22 2025, 5:16 PM · SRE, SRE-Access-Requests
Raine updated the task description for T407955: Requesting access to ops-limited for dpogorzelski.
Oct 22 2025, 5:15 PM · SRE, SRE-Access-Requests
Raine moved T408004: hw troubleshooting: host unresponsive for wikikube-worker2203.codfw.wmnet from Incoming 🐫 to 🛠 Upgrades and Hardware on the serviceops board.
Oct 22 2025, 5:01 PM · SRE, serviceops, ops-codfw, DC-Ops
Raine created T408004: hw troubleshooting: host unresponsive for wikikube-worker2203.codfw.wmnet.
Oct 22 2025, 5:01 PM · SRE, serviceops, ops-codfw, DC-Ops