Page MenuHomePhabricator

taavi (Taavi Väänänen)
SREAdministrator

Projects (28)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Monday

  • Clear sailing ahead.

User Details

User Since
Feb 24 2019, 3:58 PM (260 w, 6 d)
Roles
Administrator
Availability
Available
IRC Nick
taavi
LDAP User
Majavah
MediaWiki User
Taavi [ Global Accounts ]

Recent Activity

Today

taavi triaged T358421: db2118 crashed and rebooted due to HW as Unbreak Now! priority.
Sat, Feb 24, 10:18 AM · ops-eqiad, Wikimedia-Incident, DBA, SRE

Yesterday

taavi merged task T358393: User preferences no longer working: mw.user.options is not reflecting database on beta cluster into T358364: Edits not saved on beta cluster.
Fri, Feb 23, 9:51 PM · MediaWiki-Core-Preferences, Beta-Cluster-Infrastructure
taavi merged task T358390: [wikifunctions-beta] Any edit fails to be published into T358364: Edits not saved on beta cluster.
Fri, Feb 23, 9:51 PM · Abstract Wikipedia team, WikiLambda
taavi merged tasks T358390: [wikifunctions-beta] Any edit fails to be published , T358393: User preferences no longer working: mw.user.options is not reflecting database on beta cluster into T358364: Edits not saved on beta cluster.
Fri, Feb 23, 9:51 PM · Beta-Cluster-Infrastructure, Beta-Cluster-reproducible
taavi added a project to T358364: Edits not saved on beta cluster: Beta-Cluster-Infrastructure.
Fri, Feb 23, 6:23 PM · Beta-Cluster-Infrastructure, Beta-Cluster-reproducible
taavi committed rETKBe13ad3b7ce59: composer fix (authored by taavi).
composer fix
Fri, Feb 23, 6:13 PM
taavi renamed T358112: Special:Contributions for IP ranges fails with InvalidArgumentException , due to CentralAuth from Special:Contributions for IP ranges fails with InvalidArgumentException , due to CentralAuth volkankaos to Special:Contributions for IP ranges fails with InvalidArgumentException , due to CentralAuth.
Fri, Feb 23, 5:39 PM · MW-1.42-notes (1.42.0-wmf.20; 2024-02-27), MediaWiki-Platform-Team, Wikimedia-production-error, MediaWiki-extensions-CentralAuth
taavi closed T358112: Special:Contributions for IP ranges fails with InvalidArgumentException , due to CentralAuth, a subtask of T354437: 1.42.0-wmf.19 deployment blockers, as Resolved.
Fri, Feb 23, 5:38 PM · Release-Engineering-Team (Now this 🫠), Release, Train Deployments
taavi moved T358340: disable-tool is stuck on tools-nfs-2 from In Review to Done on the Toolforge (Toolforge iteration 06) board.
Fri, Feb 23, 5:15 PM · Toolforge (Toolforge iteration 06)
taavi closed T358340: disable-tool is stuck on tools-nfs-2 as Resolved.
Fri, Feb 23, 5:15 PM · Toolforge (Toolforge iteration 06)
taavi created T358343: wmf_auto_restart_cron.service failing in Cloud VPS bookworm instances.
Fri, Feb 23, 2:33 PM · Puppet, Infrastructure-Foundations, Cloud-VPS, cloud-services-team
taavi removed a subtask for T194333: [Epic] Provide logging/metrics/monitoring SaaS for Cloud VPS tenants: T215155: Toolforge: systemd monitoring.
Fri, Feb 23, 2:29 PM · cloud-services-team, Epic, Cloud-VPS
taavi removed a parent task for T215155: Toolforge: systemd monitoring: T194333: [Epic] Provide logging/metrics/monitoring SaaS for Cloud VPS tenants.
Fri, Feb 23, 2:29 PM · cloud-services-team, Toolforge
taavi changed the status of T358340: disable-tool is stuck on tools-nfs-2 from Open to In Progress.
Fri, Feb 23, 2:14 PM · Toolforge (Toolforge iteration 06)
taavi claimed T358340: disable-tool is stuck on tools-nfs-2.
Fri, Feb 23, 1:53 PM · Toolforge (Toolforge iteration 06)
taavi created T358340: disable-tool is stuck on tools-nfs-2.
Fri, Feb 23, 1:52 PM · Toolforge (Toolforge iteration 06)
taavi updated the task description for T313030: [toolforge.infra] Replace Toolschecker alerts with Prometheus based ones.
Fri, Feb 23, 1:45 PM · cloud-services-team, Toolforge
taavi closed T358333: Remove toolschecker grid engine checks, a subtask of T313030: [toolforge.infra] Replace Toolschecker alerts with Prometheus based ones, as Resolved.
Fri, Feb 23, 1:42 PM · cloud-services-team, Toolforge
taavi closed T358333: Remove toolschecker grid engine checks, a subtask of T314664: Toolforge: Decommission the Grid Engine infrastructure, as Resolved.
Fri, Feb 23, 1:42 PM · cloud-services-team (FY2023/2024-Q3-Q4), Goal, Patch-For-Review, Toolforge
taavi closed T358333: Remove toolschecker grid engine checks as Resolved.
Fri, Feb 23, 1:42 PM · cloud-services-team, Toolforge
taavi assigned T307651: Upgrade Toolforge Kubernetes to version 1.24 to aborrero.
Fri, Feb 23, 12:20 PM · User-aborrero, cloud-services-team, Toolforge
taavi added a subtask for T314664: Toolforge: Decommission the Grid Engine infrastructure: T358333: Remove toolschecker grid engine checks.
Fri, Feb 23, 12:01 PM · cloud-services-team (FY2023/2024-Q3-Q4), Goal, Patch-For-Review, Toolforge
taavi added a parent task for T358333: Remove toolschecker grid engine checks: T314664: Toolforge: Decommission the Grid Engine infrastructure.
Fri, Feb 23, 12:01 PM · cloud-services-team, Toolforge
taavi claimed T358333: Remove toolschecker grid engine checks.
Fri, Feb 23, 12:01 PM · cloud-services-team, Toolforge
taavi created T358333: Remove toolschecker grid engine checks.
Fri, Feb 23, 11:57 AM · cloud-services-team, Toolforge
taavi committed rCCKB97306ac49c27: toolforge: k8s: Support containerd as container runtime (authored by taavi).
toolforge: k8s: Support containerd as container runtime
Fri, Feb 23, 10:55 AM
taavi added a parent task for T314664: Toolforge: Decommission the Grid Engine infrastructure: T358320: [toolforge-webservice] Remove old webservice-runner code.
Fri, Feb 23, 10:19 AM · cloud-services-team (FY2023/2024-Q3-Q4), Goal, Patch-For-Review, Toolforge
taavi added a subtask for T358320: [toolforge-webservice] Remove old webservice-runner code: T314664: Toolforge: Decommission the Grid Engine infrastructure.
Fri, Feb 23, 10:19 AM · Toolforge
taavi added a parent task for T293552: Remove Python/webservice-runner from toolforge web containers: T358320: [toolforge-webservice] Remove old webservice-runner code.
Fri, Feb 23, 10:19 AM · Patch-For-Review, cloud-services-team, Toolforge
taavi added a subtask for T358320: [toolforge-webservice] Remove old webservice-runner code: T293552: Remove Python/webservice-runner from toolforge web containers.
Fri, Feb 23, 10:19 AM · Toolforge
taavi created T358320: [toolforge-webservice] Remove old webservice-runner code.
Fri, Feb 23, 10:19 AM · Toolforge
taavi added a comment to T358175: dbreps job pending to start for 2d16h on Toolforge.

Not easily, the same Pending status as reported by kube-state-metrics seems to also include things pods where the image configured does not exist and other user errors.

Does this happen a lot? I would've thought webservice/toolforge jobs would prevent that from happening.

Fri, Feb 23, 10:12 AM · Toolforge (Toolforge iteration 06)

Thu, Feb 22

taavi added a comment to T358044: Migrate dev user accounts for bvibber.

Only members of https://gerrit.wikimedia.org/r/admin/groups/2021f25e7515187a81d51f8fe14dd6f25617cce0 can amend changes by someone else. I added you.

Thu, Feb 22, 7:53 PM · Patch-For-Review, Phabricator, SRE, SRE-Access-Requests, LDAP-Access-Requests
taavi committed R2155:92923aa40f6e: utils: Add explicit cache prefix (authored by taavi).
utils: Add explicit cache prefix
Thu, Feb 22, 5:14 PM
taavi committed R2155:2ef1f4af22bc: Fix LDAP uidNumber usage (authored by taavi).
Fix LDAP uidNumber usage
Thu, Feb 22, 5:14 PM
taavi committed R2155:4f343e404327: Batch LDAP queries (authored by taavi).
Batch LDAP queries
Thu, Feb 22, 5:14 PM
taavi committed R2155:127e2b5856aa: Fix name loading (authored by taavi).
Fix name loading
Thu, Feb 22, 5:14 PM
taavi committed R2155:89e23a95b72c: Remove legacy owner mapping (authored by taavi).
Remove legacy owner mapping
Thu, Feb 22, 5:14 PM
taavi committed R2155:fe30d55606e8: jobs: Use bookworm (authored by taavi).
jobs: Use bookworm
Thu, Feb 22, 5:14 PM
taavi committed R2155:80a8a0dd78e1: Migrate to build service (authored by taavi).
Migrate to build service
Thu, Feb 22, 5:14 PM
taavi committed R2155:2276519b5361: utils: Use envvars for database credentials (authored by taavi).
utils: Use envvars for database credentials
Thu, Feb 22, 5:14 PM
taavi committed R1944:cc5b01a0db43: Update copyright years (authored by taavi).
Update copyright years
Thu, Feb 22, 5:11 PM
taavi committed R1944:ffde13f662fd: Add buildservice config (authored by taavi).
Add buildservice config
Thu, Feb 22, 5:11 PM
taavi committed R1944:e7e3359997a1: Use envvars for database access (authored by taavi).
Use envvars for database access
Thu, Feb 22, 5:11 PM
taavi committed R1944:38cbd1ed3440: Call slices sections (authored by taavi).
Call slices sections
Thu, Feb 22, 5:11 PM
taavi committed R1944:a328f33fa8d3: Show max lag for section in wiki table (authored by taavi).
Show max lag for section in wiki table
Thu, Feb 22, 5:11 PM
taavi committed R1944:7a480d877c68: Update jQuery and tablesorter (authored by taavi).
Update jQuery and tablesorter
Thu, Feb 22, 5:11 PM
taavi committed R1944:72f6cfd03dbe: Update Git repository URL (authored by taavi).
Update Git repository URL
Thu, Feb 22, 5:11 PM
taavi committed R1944:e1057a69b3e1: Fix missing quote mark in HTML (authored by taavi).
Fix missing quote mark in HTML
Thu, Feb 22, 5:11 PM
taavi updated the task description for T306039: Decision request - Toolforge external infrastructure domain usage.
Thu, Feb 22, 5:02 PM · User-aborrero, Toolforge, Cloud Services Proposals
taavi closed T357963: SystemdUnitDown Unit wmf_auto_restart_virtlogd.service on node cloudvirt1032 has been down for long. as Resolved.
Thu, Feb 22, 4:14 PM · cloud-services-team
taavi closed T357886: PuppetZeroResources Zero Puppet resources on cloudvirt2004-dev:9100 as Resolved.
Thu, Feb 22, 4:14 PM · cloud-services-team
taavi closed T357887: PuppetZeroResources Zero Puppet resources on cloudnet2008-dev:9100 as Resolved.
Thu, Feb 22, 4:13 PM · cloud-services-team
taavi closed T192225: Add option to hide unwanted tool accounts from Striker UI as Declined.

Tool accounts can be deleted now.

Thu, Feb 22, 3:26 PM · Striker
taavi awarded T173748: Create a "recent changes" feed for Striker a Love token.
Thu, Feb 22, 3:24 PM · Striker
taavi edited P4372 new-es-password.sh.
Thu, Feb 22, 2:16 PM · Elasticsearch, Toolforge
taavi added a comment to T357227: Elasticsearch credential request for capacity-exchange.
tools.capacity-exchange@tools-sgebastion-11:~$ toolforge envvars show TOOL_ELASTICSEARCH_PASSWORD
name                         value
TOOL_ELASTICSEARCH_PASSWORD  $6$EhKG5NUX/[...]

That's a password hash, not a password...

Thu, Feb 22, 2:04 PM · cloud-services-team, Toolforge
taavi edited P4372 new-es-password.sh.
Thu, Feb 22, 1:59 PM · Elasticsearch, Toolforge
taavi added a comment to T355281: Set up some beta cluster wikis with different registrable domain.

@taavi - quick question - do you think we should keep the .beta part in the URL? Eg test2.wikipedia.beta.wmcloud or can we skip the beta part and do only test2.wikipedia.wmcloud.org ?

Quick answer: yes, let's keep it.

Thu, Feb 22, 1:55 PM · MediaWiki-Platform-Team, Beta-Cluster-Infrastructure
taavi placed T358203: Add node anti-affinity topologySpreadConstraints to infrastructure components where relevant up for grabs.
Thu, Feb 22, 1:47 PM · Toolforge (Toolforge iteration 06)
taavi added a project to T358152: troubleshoot why initial pageloads of trace.wikimedia.org are so slow : Observability-Tracing.
Thu, Feb 22, 12:46 PM · Observability-Tracing
taavi added a project to T358111: oauth2-proxy config changes don't cause any change in the helm Deployment: Observability-Tracing.
Thu, Feb 22, 12:46 PM · Observability-Tracing, Patch-For-Review
taavi closed T355883: Create a pool of NFS-less Toolforge Kubernetes workers as Resolved.

So I added three non-NFS workers, tools-k8s-worker-102 to 104. So far they're being used by various infrastructure things, buildservice image-build pods, and a few tools with buildservice images. That's roughly what I'd expect, especially with only a few evictions from the NFS nodes this morning.

Thu, Feb 22, 11:36 AM · Patch-For-Review, Toolforge (Toolforge iteration 06)
taavi changed the status of T355883: Create a pool of NFS-less Toolforge Kubernetes workers from Open to In Progress.
Thu, Feb 22, 11:20 AM · Patch-For-Review, Toolforge (Toolforge iteration 06)
taavi triaged T358203: Add node anti-affinity topologySpreadConstraints to infrastructure components where relevant as Medium priority.
Thu, Feb 22, 11:19 AM · Toolforge (Toolforge iteration 06)
taavi closed T358194: [jobs-api] Getting errors when listing jobs as Resolved.

Adding the missing nodeSelector seems to have fixed it. So T355883: Create a pool of NFS-less Toolforge Kubernetes workers broke this, since I thought I'd added that everywhere already.

Thu, Feb 22, 11:03 AM · Toolforge (Toolforge iteration 06)
taavi merged T358198: Message-ID: <CACNAgmwRgLo4Uy0w9ZtAi07kB_rA9wxC5T3p+kr+ftWDUq7hWg@mail.gmail.com> delayed by 3 days into T358020: Not receiving posts or moderation messages.
Thu, Feb 22, 10:50 AM · Wikimedia-Incident, SRE, Wikimedia-Mailing-lists
taavi merged task T358198: Message-ID: <CACNAgmwRgLo4Uy0w9ZtAi07kB_rA9wxC5T3p+kr+ftWDUq7hWg@mail.gmail.com> delayed by 3 days into T358020: Not receiving posts or moderation messages.
Thu, Feb 22, 10:49 AM · SRE, Wikimedia-Mailing-lists
taavi claimed T355883: Create a pool of NFS-less Toolforge Kubernetes workers.
Thu, Feb 22, 10:48 AM · Patch-For-Review, Toolforge (Toolforge iteration 06)
taavi created P57691 (An Untitled Masterwork).
Thu, Feb 22, 10:40 AM
taavi closed T357889: PuppetZeroResources as Resolved.
Thu, Feb 22, 9:02 AM · cloud-services-team
taavi closed T358156: PuppetZeroResources Zero Puppet resources on cloudcephosd1008:9100 as Resolved.
Thu, Feb 22, 9:02 AM · cloud-services-team
taavi closed T358186: PuppetZeroResources Zero Puppet resources on cloudcephosd1021:9100 as Resolved.
Thu, Feb 22, 9:02 AM · cloud-services-team
taavi closed T358165: PuppetZeroResources Zero Puppet resources on cloudcephmon1003:9100 as Resolved.
Thu, Feb 22, 9:02 AM · cloud-services-team
taavi closed T358169: PuppetZeroResources Zero Puppet resources on cloudcephosd1034:9100 as Resolved.
Thu, Feb 22, 9:02 AM · cloud-services-team
taavi closed T358172: PuppetZeroResources Zero Puppet resources on cloudcephosd1030:9100 as Resolved.
Thu, Feb 22, 9:01 AM · cloud-services-team
taavi closed T358176: PuppetZeroResources Zero Puppet resources on cloudcephosd1022:9100 as Resolved.
Thu, Feb 22, 9:01 AM · cloud-services-team
taavi closed T358177: PuppetZeroResources Zero Puppet resources on cloudcephmon1002:9100 as Resolved.
Thu, Feb 22, 9:01 AM · cloud-services-team
taavi closed T358174: PuppetZeroResources Zero Puppet resources on cloudcephosd1017:9100 as Resolved.
Thu, Feb 22, 9:01 AM · cloud-services-team
taavi added a comment to T358175: dbreps job pending to start for 2d16h on Toolforge.

I'll see if we can alert on pods stuck in Pending for a while.

Thu, Feb 22, 8:21 AM · Toolforge (Toolforge iteration 06)
taavi changed the status of T358175: dbreps job pending to start for 2d16h on Toolforge from Open to In Progress.
Thu, Feb 22, 8:14 AM · Toolforge (Toolforge iteration 06)
taavi added a comment to T358175: dbreps job pending to start for 2d16h on Toolforge.

So the CNI path there is wrong, and our containerd config Puppetization is supposed to change that. That node was affected by T358179: [wmcs-cookbooks] wmcs.toolforge.add_k8s_node occasionally fails to setup custom Puppetmaster, so I think the reason why your pod was affected was that I failed to drain + reboot that node after fixing the certificates.

Thu, Feb 22, 8:13 AM · Toolforge (Toolforge iteration 06)
taavi triaged T358175: dbreps job pending to start for 2d16h on Toolforge as High priority.
Thu, Feb 22, 8:13 AM · Toolforge (Toolforge iteration 06)
taavi created T358179: [wmcs-cookbooks] wmcs.toolforge.add_k8s_node occasionally fails to setup custom Puppetmaster.
Thu, Feb 22, 8:10 AM · Toolforge
taavi claimed T358175: dbreps job pending to start for 2d16h on Toolforge.
Feb 19 12:14:11 tools-k8s-worker-nfs-38 kubelet[3990]: E0219 12:14:11.504588    3990 pod_workers.go:965] "Error syncing pod, skipping" err="failed to \"KillPodSandbox\" for \"dc967ea3-c6f4-4ca2-bf06-66b497e405a3\" with KillPodSandboxError: \"rpc error: code = Unknown desc = failed to destroy network for sandbox \\\"1de0a6c361bd7b7a5762ee3b22c3edf98617adf793e40961e2e38d32d9990282\\\": plugin type=\\\"loopback\\\" failed (delete): failed to find plugin \\\"loopback\\\" in path [/usr/lib/cni]\"" pod="tool-dbreps/rusty-28472400-92lw6" podUID=dc967ea3-c6f4-4ca2-bf06-66b497e405a3
Thu, Feb 22, 8:05 AM · Toolforge (Toolforge iteration 06)

Wed, Feb 21

taavi edited Description on Cloud-Services.
Wed, Feb 21, 6:42 PM
taavi edited Description on Toolforge Jobs framework.
Wed, Feb 21, 6:38 PM
taavi edited Description on Toolforge Build Service.
Wed, Feb 21, 6:38 PM
taavi closed T356567: Many LibUp workflows failing with "Warning: Cannot find module 'stylelint'" as Resolved.
Wed, Feb 21, 3:34 PM · LibUp
taavi added a project to T358101: nova-compute: error running local ceph command: Cloud-VPS.
Wed, Feb 21, 1:46 PM · Cloud-VPS, User-aborrero, cloud-services-team
taavi added a comment to T357881: [maintain-kubeusers] Allow setting the requests cpu and mem quota.

I don't know how I feel about this. For Lucas's use case, where the tool is manually assigning resources based on actual measured usage, being able to customize requests and limits is the most logical option. OTOH, the jobs framework is very much trying to push to a direction where you just specify limits and requests are assigned based on that and sensible general defaults, and just incrementing the limits values for this tool to be high enough to create requests values we want means we don't need to add this "obscure" feature to maintain-kubeusers.

Wed, Feb 21, 1:09 PM · Toolforge (Toolforge iteration 06), Cloud-Services-Worktype-Project, Cloud-Services-Origin-Team, cloud-services-team (FY2023/2024-Q3-Q4), User-dcaro
taavi removed a member for WMF-NDA: brion.
Wed, Feb 21, 11:07 AM
taavi committed rCCKB79460b36e17e: toolforge: k8s: Fix arguments being passed to drain cookbook (authored by taavi).
toolforge: k8s: Fix arguments being passed to drain cookbook
Wed, Feb 21, 9:37 AM
taavi committed rCCKB15548aeefe9d: inventory: Refresh tools kubernetes control nodes (authored by taavi).
inventory: Refresh tools kubernetes control nodes
Wed, Feb 21, 9:37 AM
taavi edited projects for T355082: GLAMWiki Dashboard not loading, added: VPS-Projects; removed Cloud-VPS.
Wed, Feb 21, 9:07 AM · VPS-Projects

Tue, Feb 20

taavi added a comment to T358044: Migrate dev user accounts for bvibber.

Can't log into gerrit with bvibber, it says "Authentication failed."

Tue, Feb 20, 9:06 PM · Patch-For-Review, Phabricator, SRE, SRE-Access-Requests, LDAP-Access-Requests
taavi removed a member for Trusted-Contributors: brion.
Tue, Feb 20, 9:04 PM
taavi removed a member for acl*Batch-Editors: brion.
Tue, Feb 20, 9:04 PM
taavi added a member for acl*Batch-Editors: bvibber.
Tue, Feb 20, 9:04 PM
taavi added a member for Trusted-Contributors: bvibber.
Tue, Feb 20, 9:04 PM