Page MenuHomePhabricator

LSobanski (Lukasz Sobanski)
Woo$

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Monday

  • Clear sailing ahead.

User Details

User Since
Aug 31 2020, 5:40 PM (241 w, 4 d)
Availability
Available
LDAP User
LSobanski
MediaWiki User
LSobanski (WMF) [ Global Accounts ]

Recent Activity

Yesterday

LSobanski moved T392128: upgrade aphlict hosts to bookworm from Incoming to Work in Progress on the collaboration-services board.
Fri, Apr 18, 1:09 PM · collaboration-services
LSobanski assigned T392128: upgrade aphlict hosts to bookworm to Arnoldokoth.
Fri, Apr 18, 1:09 PM · collaboration-services
LSobanski triaged T391578: Releases failover process as High priority.
Fri, Apr 18, 11:37 AM · collaboration-services
LSobanski triaged T392212: gerrit: motd display of the instance's status as Low priority.
Fri, Apr 18, 11:37 AM · collaboration-services
LSobanski triaged T392128: upgrade aphlict hosts to bookworm as Medium priority.
Fri, Apr 18, 11:37 AM · collaboration-services
LSobanski triaged T392130: upgrade doc hosts to bookworm as Medium priority.
Fri, Apr 18, 11:37 AM · collaboration-services
LSobanski moved T392127: upgrade releases hosts to bookworm from Incoming to Work in Progress on the collaboration-services board.
Fri, Apr 18, 11:27 AM · Patch-For-Review, collaboration-services
LSobanski triaged T392127: upgrade releases hosts to bookworm as Medium priority.
Fri, Apr 18, 11:27 AM · Patch-For-Review, collaboration-services
LSobanski assigned T392127: upgrade releases hosts to bookworm to Dzahn.
Fri, Apr 18, 11:26 AM · Patch-For-Review, collaboration-services

Thu, Apr 17

LSobanski added a comment to T392186: Grant Gerrit admin to arnaudb.

Approved.

Thu, Apr 17, 12:02 PM · SRE, collaboration-services, Gerrit, LDAP-Access-Requests
LSobanski raised the priority of T384595: Upgrade Collab hosts to Bookworm from Low to Medium.
Thu, Apr 17, 10:47 AM · Patch-For-Review, collaboration-services

Wed, Apr 16

LSobanski created T392091: Alert in need of triage: AlertLintProblem (instance localhost:9123).
Wed, Apr 16, 1:50 PM · Traffic, SRE Observability, sre-alert-triage

Tue, Apr 15

LSobanski renamed T391942: SystemdUnitFailed - lists1004 - wmf_auto_restart_exim4 from SystemdUnitFailed to SystemdUnitFailed - lists1004.
Tue, Apr 15, 11:13 AM · collaboration-services
LSobanski added a comment to T391904: archiva1002 - disk 98% full.

Archiva is on a path to deprecation so this is likely an ask to disable the alerting altogether.

Tue, Apr 15, 7:31 AM · Data-Platform-SRE (2025-04-12 - 2025-05-02), SRE
LSobanski added a project to T391904: archiva1002 - disk 98% full: Data-Platform-SRE.
Tue, Apr 15, 7:30 AM · Data-Platform-SRE (2025-04-12 - 2025-05-02), SRE
LSobanski renamed T391923: SystemdUnitFailed - lists1004 from SystemdUnitFailed to SystemdUnitFailed - lists1004.
Tue, Apr 15, 7:28 AM · collaboration-services

Mon, Apr 14

LSobanski moved T391590: PuppetFailure - releases2003 from Incoming to Work in Progress on the collaboration-services board.
Mon, Apr 14, 3:38 PM · collaboration-services
LSobanski assigned T391590: PuppetFailure - releases2003 to Dzahn.
Mon, Apr 14, 3:38 PM · collaboration-services
LSobanski updated subscribers of T391590: PuppetFailure - releases2003.

@Arnoldokoth - related to the upgrade?

Mon, Apr 14, 7:33 AM · collaboration-services
LSobanski renamed T391590: PuppetFailure - releases2003 from PuppetFailure to PuppetFailure - releases2003.
Mon, Apr 14, 7:32 AM · collaboration-services

Thu, Apr 10

LSobanski created T391578: Releases failover process.
Thu, Apr 10, 1:07 PM · collaboration-services
LSobanski added a comment to T391330: Backlog in mailing lists is increasing.

@Quiddity have you seen any improvement since the last time we checked?

Thu, Apr 10, 12:11 PM · Wikimedia-Incident, collaboration-services, SRE, Wikimedia-Mailing-lists

Wed, Apr 9

LSobanski moved T391330: Backlog in mailing lists is increasing from Incoming to Work in Progress on the collaboration-services board.
Wed, Apr 9, 12:10 PM · Wikimedia-Incident, collaboration-services, SRE, Wikimedia-Mailing-lists
LSobanski created T391465: Alert in need of triage: DiskSpace (instance ml-lab1001:9100).
Wed, Apr 9, 12:09 PM · Machine-Learning-Team, sre-alert-triage

Tue, Apr 8

LSobanski added a comment to T391330: Backlog in mailing lists is increasing.

Fixed time dashboard for reference: https://grafana.wikimedia.org/d/GvuAmuuGk/mailman3?orgId=1&viewPanel=2&from=1744089600000&to=1744107600000

Tue, Apr 8, 10:21 AM · Wikimedia-Incident, collaboration-services, SRE, Wikimedia-Mailing-lists

Mon, Apr 7

LSobanski moved T388022: Phabricator test project requires email verification but can't send email from Incoming to Work in Progress on the collaboration-services board.
Mon, Apr 7, 3:32 PM · collaboration-services, VPS-project-Phabricator
LSobanski triaged T388022: Phabricator test project requires email verification but can't send email as Low priority.
Mon, Apr 7, 3:32 PM · collaboration-services, VPS-project-Phabricator
LSobanski moved T390034: Prepare a database test for m3 from Incoming to Consultation on the collaboration-services board.
Mon, Apr 7, 3:25 PM · collaboration-services, Release-Engineering-Team, User-brennen, DBA, Phabricator
LSobanski moved T390838: VRTS Circuit breaker from Incoming to Backlog on the collaboration-services board.
Mon, Apr 7, 3:24 PM · collaboration-services, vrts
LSobanski triaged T390838: VRTS Circuit breaker as Medium priority.
Mon, Apr 7, 3:24 PM · collaboration-services, vrts
LSobanski moved T390948: Cleanup collaboration-services WMCS hiera config from Incoming to Work in Progress on the collaboration-services board.
Mon, Apr 7, 3:24 PM · collaboration-services
LSobanski created T391241: Alert in need of triage: HelmfileAdminNGPendingChanges (instance deploy1003:9100).
Mon, Apr 7, 9:06 AM · Patch-For-Review, Infrastructure-Foundations, sre-alert-triage

Tue, Apr 1

LSobanski reopened T260666: Create a cookbook to automate gerrit's switchover as "Open".

This is separate from the activity in T387833: Gerrit failover process so let's keep it open.

Tue, Apr 1, 10:11 AM · Patch-For-Review, collaboration-services, Infrastructure-Foundations, SRE-tools, serviceops, SRE
LSobanski created T390676: Alert in need of triage: ProbeDown (instance ripe-atlas-codfw:0).
Tue, Apr 1, 10:07 AM · Infrastructure-Foundations, sre-alert-triage

Mon, Mar 31

LSobanski added a comment to T389079: VRT Logons are delayed.

Junk is now down to 18k tickets, which is more or less the usual amount.

Mon, Mar 31, 11:36 AM · collaboration-services, Wikimedia-production-error, vrts
LSobanski moved T389004: VRTS bounces filled mail queues, resulting in a weekend page from Incoming to Backlog on the collaboration-services board.
Mon, Mar 31, 11:35 AM · collaboration-services, vrts

Thu, Mar 27

LSobanski closed Restricted Task, a subtask of T389004: VRTS bounces filled mail queues, resulting in a weekend page, as Resolved.
Thu, Mar 27, 7:35 PM · collaboration-services, vrts

Fri, Mar 21

LSobanski added a comment to T389079: VRT Logons are delayed.

On second thought and considering it's Friday evening in EMEA maybe let's not take the risk.

Fri, Mar 21, 5:11 PM · collaboration-services, Wikimedia-production-error, vrts
LSobanski added a comment to T385930: Consider disabling personal access token forced expiration.

Same thing for gitlab-exporter, we got a 60 day expiry notification yesterday.

Fri, Mar 21, 12:53 PM · collaboration-services, User-brennen, Release-Engineering-Team (Priority Backlog 📥), GitLab (Administration, Settings & Policy)
LSobanski added a comment to T388782: Alert in need of triage: Postgres Replication Lag (instance maps-test2002).

There are three overdue alerts for maps-test, two of which are critical. Can these be disabled or downgraded?

Fri, Mar 21, 10:56 AM · serviceops, sre-alert-triage
LSobanski added a comment to T389079: VRT Logons are delayed.

@Krd Znuny suggested temporarily raising the "GenericAgentRunLimit" from 4000 to 40000 and letting it run every 10 minutes to speed up the clean up.

Fri, Mar 21, 10:25 AM · collaboration-services, Wikimedia-production-error, vrts

Mar 20 2025

LSobanski lowered the priority of T349333: VRTS issue with incoming forwards from a Chapter address from High to Medium.
Mar 20 2025, 8:11 AM · collaboration-services, Znuny
LSobanski triaged T387833: Gerrit failover process as High priority.
Mar 20 2025, 8:10 AM · Patch-For-Review, collaboration-services
LSobanski triaged T389004: VRTS bounces filled mail queues, resulting in a weekend page as Medium priority.
Mar 20 2025, 8:10 AM · collaboration-services, vrts
LSobanski added a comment to T389004: VRTS bounces filled mail queues, resulting in a weekend page.

The root cause was addressed but it's still worth a conversation about whether we can introduce a circuit-breaker for the next time this happens.

Mar 20 2025, 8:10 AM · collaboration-services, vrts

Mar 19 2025

LSobanski added a comment to T389077: Znuny now uses 17 digit ticket numbers?.

To be more specific, we use Ticket::Number::DateChecksum setting which the documentation describes as:

Mar 19 2025, 8:32 AM · collaboration-services, Znuny

Mar 18 2025

LSobanski added a comment to T389079: VRT Logons are delayed.

The Junk queue is now shrinking instead of growing now but is still very large (over 1.4 million tickets). Unless there is a way to purge the queue, let's wait until it is emptied by the standard process. As a side note, we may want to consider monitoring the Junk queue size going forward.

Mar 18 2025, 5:12 PM · collaboration-services, Wikimedia-production-error, vrts
LSobanski moved T389079: VRT Logons are delayed from Incoming to Work in Progress on the collaboration-services board.
Mar 18 2025, 5:01 PM · collaboration-services, Wikimedia-production-error, vrts
LSobanski moved T389080: Fix dependencies between admin_ng deployments from Incoming to K8s on the collaboration-services board.
Mar 18 2025, 10:44 AM · collaboration-services, Kubernetes, Prod-Kubernetes, serviceops
LSobanski moved T389084: Check/update grafana dashboards for k8s 1.31 from Incoming to K8s on the collaboration-services board.
Mar 18 2025, 10:43 AM · collaboration-services, Data-Platform-SRE, Kubernetes, Prod-Kubernetes, serviceops
LSobanski moved T389086: wipe-cluster cookbook should check if systemd services have started properly from Incoming to K8s on the collaboration-services board.
Mar 18 2025, 10:43 AM · collaboration-services, Kubernetes, Prod-Kubernetes, serviceops

Mar 17 2025

LSobanski added a project to T389004: VRTS bounces filled mail queues, resulting in a weekend page: collaboration-services.
Mar 17 2025, 8:19 AM · collaboration-services, vrts
LSobanski created T389038: Alert in need of triage: SystemdUnitFailed (instance cumin1002:9100).
Mar 17 2025, 8:18 AM · serviceops, sre-alert-triage
LSobanski created T389037: Alert in need of triage: WidespreadPuppetFailure .
Mar 17 2025, 8:17 AM · serviceops, sre-alert-triage

Mar 13 2025

LSobanski created T388782: Alert in need of triage: Postgres Replication Lag (instance maps-test2002).
Mar 13 2025, 12:33 PM · serviceops, sre-alert-triage

Mar 10 2025

LSobanski moved T387830: Contint failover process from Incoming to Backlog on the collaboration-services board.
Mar 10 2025, 5:29 PM · collaboration-services
LSobanski triaged T387830: Contint failover process as Medium priority.
Mar 10 2025, 5:28 PM · collaboration-services
LSobanski moved T387548: Fix alternatives entries in helm and kubernetes-client packages from Incoming to K8s on the collaboration-services board.
Mar 10 2025, 5:28 PM · Patch-For-Review, collaboration-services, Data-Platform-SRE, Kubernetes, Prod-Kubernetes, serviceops
LSobanski moved T387886: Jobs on Digital Ocean Cloud Runners are being OOM killed from Incoming to Consultation on the collaboration-services board.
Mar 10 2025, 5:27 PM · Release-Engineering-Team (Priority Backlog 📥), User-brennen, GitLab (CI & Job Runners), collaboration-services
LSobanski moved T388235: ProbeDown (gerrit1003) from Incoming to Work in Progress on the collaboration-services board.
Mar 10 2025, 5:27 PM · Release-Engineering-Team, collaboration-services
LSobanski triaged T388235: ProbeDown (gerrit1003) as High priority.
Mar 10 2025, 5:27 PM · Release-Engineering-Team, collaboration-services
LSobanski assigned T388235: ProbeDown (gerrit1003) to Jelto.
Mar 10 2025, 5:27 PM · Release-Engineering-Team, collaboration-services
LSobanski triaged T388354: Move stewards-l synchronization to the Onboarding System as Medium priority.
Mar 10 2025, 5:26 PM · collaboration-services, Stewards-Onboarding-Tool
LSobanski moved T388212: Update profile::stewards::gitlab_api_token to match new token from Incoming to Work in Progress on the collaboration-services board.
Mar 10 2025, 5:26 PM · collaboration-services, GitLab, Stewards-Onboarding-Tool
LSobanski triaged T388212: Update profile::stewards::gitlab_api_token to match new token as Medium priority.
Mar 10 2025, 5:26 PM · collaboration-services, GitLab, Stewards-Onboarding-Tool
LSobanski moved T388388: Ensure all required kubectl versions are installed on deploy hosts from Incoming to Work in Progress on the collaboration-services board.
Mar 10 2025, 5:25 PM · collaboration-services, Data-Platform-SRE, Kubernetes, Prod-Kubernetes, serviceops
LSobanski updated subscribers of T385067: Set up dual-stack ECDSA/RSA certificate support for Exim.

@jhathaway any thoughts on this?

Mar 10 2025, 5:24 PM · collaboration-services, SRE, Wikimedia-Mailing-lists
LSobanski added a project to T385067: Set up dual-stack ECDSA/RSA certificate support for Exim: collaboration-services.
Mar 10 2025, 3:33 PM · collaboration-services, SRE, Wikimedia-Mailing-lists
LSobanski added a comment to T385067: Set up dual-stack ECDSA/RSA certificate support for Exim.

What's the timeline for dropping RSA certs? Just so we know how urgent this is.

Mar 10 2025, 3:33 PM · collaboration-services, SRE, Wikimedia-Mailing-lists
LSobanski moved T388387: Update kube-state-metrics for k8s 1.31 from Incoming to K8s on the collaboration-services board.
Mar 10 2025, 12:16 PM · collaboration-services, Kubernetes, Prod-Kubernetes, serviceops
LSobanski moved T388390: Ensure the correct helm version is used for each cluster from Incoming to K8s on the collaboration-services board.
Mar 10 2025, 12:16 PM · Patch-For-Review, collaboration-services, Data-Platform-SRE, Kubernetes, Prod-Kubernetes, serviceops
LSobanski created T388398: Alert in need of triage: HelmfileAdminNGPendingChanges (instance deploy1003:9100).
Mar 10 2025, 12:15 PM · serviceops, sre-alert-triage

Mar 4 2025

LSobanski added a project to T387754: Ops-monitoring-bot creating duplicate tasks for the same RAID failure: SRE Observability.
Mar 4 2025, 9:43 AM · Observability-Alerting, SRE Observability (FY2024/2025-Q3), SRE
LSobanski moved T387833: Gerrit failover process from Incoming to Work in Progress on the collaboration-services board.
Mar 4 2025, 9:37 AM · Patch-For-Review, collaboration-services
LSobanski moved T387619: phabricator.wmcloud.org says "Access denied for user 'app_user'@'localhost'" from Incoming to Work in Progress on the collaboration-services board.
Mar 4 2025, 9:36 AM · collaboration-services, VPS-project-Phabricator
LSobanski moved T384450: Update wikikube-staging-codfw to kubernetes 1.31 from Incoming to K8s on the collaboration-services board.
Mar 4 2025, 9:36 AM · Patch-For-Review, collaboration-services, Kubernetes, Prod-Kubernetes, serviceops
LSobanski moved T383553: Set cert-manager leader election namespace to cert-manager from Incoming to K8s on the collaboration-services board.
Mar 4 2025, 9:36 AM · collaboration-services, Data-Platform-SRE, Kubernetes, Prod-Kubernetes, serviceops
LSobanski moved T386694: Replace k8s-controller-sidecars with built in Sidecar containers on k8s 1.31 from Incoming to K8s on the collaboration-services board.
Mar 4 2025, 9:36 AM · collaboration-services, Kubernetes, Prod-Kubernetes, serviceops
LSobanski moved T387760: Migrate release template inheritance in helmfiles from YAML anchors to the inherit field from Incoming to K8s on the collaboration-services board.
Mar 4 2025, 9:36 AM · collaboration-services, Data-Platform-SRE, Kubernetes, Prod-Kubernetes, serviceops
LSobanski moved T387836: top-level config key environments must be defined before releases in helmfile.yaml from Incoming to K8s on the collaboration-services board.
Mar 4 2025, 9:36 AM · Data-Platform-SRE (2025.03.22 - 2025.04.11), collaboration-services, Kubernetes, Prod-Kubernetes, serviceops
LSobanski moved T387837: Fix installed key in dependend helmfile releases from Incoming to K8s on the collaboration-services board.
Mar 4 2025, 9:36 AM · Data-Platform-SRE (2025.03.22 - 2025.04.11), Patch-For-Review, collaboration-services, Kubernetes, Prod-Kubernetes, serviceops
LSobanski assigned T387833: Gerrit failover process to ABran-WMF.
Mar 4 2025, 9:01 AM · Patch-For-Review, collaboration-services
LSobanski added a parent task for T387833: Gerrit failover process: T387831: Standardize failover procedures for Collab services.
Mar 4 2025, 9:00 AM · Patch-For-Review, collaboration-services
LSobanski added a subtask for T387831: Standardize failover procedures for Collab services: T387833: Gerrit failover process.
Mar 4 2025, 9:00 AM · collaboration-services
LSobanski updated the task description for T387831: Standardize failover procedures for Collab services.
Mar 4 2025, 9:00 AM · collaboration-services
LSobanski created T387833: Gerrit failover process.
Mar 4 2025, 8:59 AM · Patch-For-Review, collaboration-services
LSobanski renamed T387831: Standardize failover procedures for Collab services from Standardizing failover procedures for Collab services to Standardize failover procedures for Collab services.
Mar 4 2025, 8:56 AM · collaboration-services
LSobanski moved T387831: Standardize failover procedures for Collab services from Incoming to Work in Progress (Tracking tasks) on the collaboration-services board.
Mar 4 2025, 8:55 AM · collaboration-services
LSobanski added a subtask for T387831: Standardize failover procedures for Collab services: T387830: Contint failover process.
Mar 4 2025, 8:55 AM · collaboration-services
LSobanski added a parent task for T387830: Contint failover process: T387831: Standardize failover procedures for Collab services.
Mar 4 2025, 8:55 AM · collaboration-services
LSobanski created T387831: Standardize failover procedures for Collab services.
Mar 4 2025, 8:51 AM · collaboration-services
LSobanski created T387830: Contint failover process.
Mar 4 2025, 8:49 AM · collaboration-services

Mar 3 2025

LSobanski assigned T387619: phabricator.wmcloud.org says "Access denied for user 'app_user'@'localhost'" to Dzahn.
Mar 3 2025, 4:46 PM · collaboration-services, VPS-project-Phabricator
LSobanski moved T385901: mysqldump of RT database from Incoming to Work in Progress on the collaboration-services board.
Mar 3 2025, 4:46 PM · database-backups, Data-Persistence-Backup, collaboration-services, Data-Persistence
LSobanski reassigned T385901: mysqldump of RT database from jcrespo to Dzahn.
Mar 3 2025, 4:45 PM · database-backups, Data-Persistence-Backup, collaboration-services, Data-Persistence
LSobanski updated subscribers of T387619: phabricator.wmcloud.org says "Access denied for user 'app_user'@'localhost'".

@brennen for awareness.

Mar 3 2025, 4:41 PM · collaboration-services, VPS-project-Phabricator
LSobanski added a project to T387619: phabricator.wmcloud.org says "Access denied for user 'app_user'@'localhost'": collaboration-services.
Mar 3 2025, 7:36 AM · collaboration-services, VPS-project-Phabricator
LSobanski created T387699: phabricator.wmcloud.org (test instance) is broken.
Mar 3 2025, 7:28 AM · Release-Engineering-Team, collaboration-services

Feb 24 2025

LSobanski moved T274228: Phabricator should cache tasks for a few minutes for logged-out users from Incoming to Backlog on the collaboration-services board.
Feb 24 2025, 4:41 PM · Patch-For-Review, collaboration-services, SRE, Traffic, Phabricator
LSobanski moved T385042: Generic NAT Handling Solution with nftables from Incoming to Backlog on the collaboration-services board.
Feb 24 2025, 4:39 PM · User-aborrero, collaboration-services