Page MenuHomePhabricator

JMeybohm
User

Projects (7)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Tuesday

  • Clear sailing ahead.

User Details

User Since
Apr 2 2020, 9:01 AM (138 w, 2 d)
Availability
Available
IRC Nick
jayme
LDAP User
JMeybohm
MediaWiki User
JMeybohm (WMF) [ Global Accounts ]

Recent Activity

Fri, Nov 18

JMeybohm lowered the priority of T287491: Allow to address Kubernets API servers from NetworkPolicy from Medium to Low.
Fri, Nov 18, 4:03 PM · serviceops, Prod-Kubernetes, Kubernetes
JMeybohm added a subtask for T307943: Update Kubernetes clusters to v1.23: T287491: Allow to address Kubernets API servers from NetworkPolicy.
Fri, Nov 18, 4:03 PM · Foundational Technology Requests, Shared-Data-Infrastructure, Patch-For-Review, Kubernetes, Prod-Kubernetes, serviceops
JMeybohm added a parent task for T287491: Allow to address Kubernets API servers from NetworkPolicy: T307943: Update Kubernetes clusters to v1.23.
Fri, Nov 18, 4:03 PM · serviceops, Prod-Kubernetes, Kubernetes
JMeybohm moved T287491: Allow to address Kubernets API servers from NetworkPolicy from Incoming 🐫 to ⎈Kubernetes on the serviceops board.
Fri, Nov 18, 4:03 PM · serviceops, Prod-Kubernetes, Kubernetes
JMeybohm added a project to T287491: Allow to address Kubernets API servers from NetworkPolicy: serviceops.
Fri, Nov 18, 4:02 PM · serviceops, Prod-Kubernetes, Kubernetes
JMeybohm added a comment to T323349: Improve performance of deployment to mw on k8s.

Please make sure not to pre-pull images on tainted nodes (which are masters and kask/sessionstore currently).

Fri, Nov 18, 10:21 AM · Release-Engineering-Team (Seen), serviceops, MW-on-K8s
JMeybohm added a comment to T302404: fix or remove instance "rebuild" button.

My usecase was actually to create a new instance with the same name, not keeping any data. So from that you say I assume deleting and creating a new one with the same name should do the trick for me. Thanks

Fri, Nov 18, 10:15 AM · Horizon, cloud-services-team (Kanban)
JMeybohm added a comment to T302404: fix or remove instance "rebuild" button.

Is there maybe a workaround for this that could be used instead?

Fri, Nov 18, 9:29 AM · Horizon, cloud-services-team (Kanban)
JMeybohm added a subtask for T307943: Update Kubernetes clusters to v1.23: T277677: Write a cookbook to set a k8s cluster in maintenance mode.
Fri, Nov 18, 9:26 AM · Foundational Technology Requests, Shared-Data-Infrastructure, Patch-For-Review, Kubernetes, Prod-Kubernetes, serviceops
JMeybohm added a parent task for T277677: Write a cookbook to set a k8s cluster in maintenance mode: T307943: Update Kubernetes clusters to v1.23.
Fri, Nov 18, 9:26 AM · Sustainability (Incident Followup), Infrastructure-Foundations, SRE-tools, SRE, Prod-Kubernetes, serviceops
JMeybohm closed Restricted Task, a subtask of T307943: Update Kubernetes clusters to v1.23, as Resolved.
Fri, Nov 18, 9:25 AM · Foundational Technology Requests, Shared-Data-Infrastructure, Patch-For-Review, Kubernetes, Prod-Kubernetes, serviceops
JMeybohm triaged T270191: Add kubernetes 1.17+ topology annotations as Medium priority.
Fri, Nov 18, 9:23 AM · Patch-For-Review, Kubernetes, Prod-Kubernetes, serviceops
JMeybohm triaged T322919: Metrics changes with Kubernetes v1.23 as Medium priority.
Fri, Nov 18, 9:23 AM · Foundational Technology Requests, Shared-Data-Infrastructure, Kubernetes, Prod-Kubernetes, serviceops
JMeybohm closed Restricted Task, a subtask of T307943: Update Kubernetes clusters to v1.23, as Resolved.
Fri, Nov 18, 9:22 AM · Foundational Technology Requests, Shared-Data-Infrastructure, Patch-For-Review, Kubernetes, Prod-Kubernetes, serviceops
JMeybohm closed Restricted Task, a subtask of T307943: Update Kubernetes clusters to v1.23, as Resolved.
Fri, Nov 18, 9:22 AM · Foundational Technology Requests, Shared-Data-Infrastructure, Patch-For-Review, Kubernetes, Prod-Kubernetes, serviceops
JMeybohm triaged T303279: Fix calico, cfssl-issuer and knative-serving Helm dependencies as Low priority.
Fri, Nov 18, 9:20 AM · Machine-Learning-Team, serviceops

Thu, Nov 17

JMeybohm added a comment to T323296: Stop spamming SAL with helmfile on scap deployments.

helmfile_log_sal has support for that already:

# Allow to explicitely suppress logging to SAL
SUPPRESS_SAL=${SUPPRESS_SAL:-false}
Thu, Nov 17, 6:27 PM · Patch-For-Review, Performance-Team (Radar), Release-Engineering-Team (Seen), SRE, Traffic, serviceops, MW-on-K8s
JMeybohm added a comment to T322635: Define necessary RBAC rules for spark on dse-k8s cluster.

That sounds about right. Although you absolutely can specify the latter rules as a ClusterRole object and than bind it to just a namespace using a RoleBinding (instead of a ClusterRoleBinding). That way you don't have to declare the Role in every Namespace you want to use (just the RoleBinding).

Thu, Nov 17, 7:05 AM · Patch-For-Review, Shared-Data-Infrastructure (EQ2 Kanban (Sprints 04-05))

Wed, Nov 16

JMeybohm added a comment to T322635: Define necessary RBAC rules for spark on dse-k8s cluster.

[...]
Lastly, in terms of these RBAC rules, a ClusterRoleBinding is created. This specifies the namespace spark-operator so it limits the power of the spark-operator to work within its own namespace.

+ # Source: spark-operator/templates/rbac.yaml
+ apiVersion: rbac.authorization.k8s.io/v1
+ kind: ClusterRoleBinding
+ metadata:
+   name: spark-operator-spark-data-engineering
+   annotations:
+     "helm.sh/hook": pre-install, pre-upgrade
+     "helm.sh/hook-delete-policy": hook-failed, before-hook-creation
+     "helm.sh/hook-weight": "-10"
+   labels:
+     app: spark-operator
+     chart: spark-operator-0.0.1
+     release: spark-data-engineering
+     heritage: Helm
+ subjects:
+   - kind: ServiceAccount
+     name: spark-operator
+     namespace: spark-operator
+ roleRef:
+   kind: ClusterRole
+   name: spark-operator-spark-data-engineering
+   apiGroup: rbac.authorization.k8s.io

I will discuss the other resources in the parent ticket.

Wed, Nov 16, 11:11 AM · Patch-For-Review, Shared-Data-Infrastructure (EQ2 Kanban (Sprints 04-05))
JMeybohm updated the task description for T290963: Drop the use of nonexisting groups in kubernetes infrastructure_users.
Wed, Nov 16, 10:59 AM · serviceops, Prod-Kubernetes, Kubernetes
JMeybohm claimed T299236: Move away from system:node RBAC role.
Wed, Nov 16, 10:57 AM · serviceops, Prod-Kubernetes, Kubernetes

Tue, Nov 15

JMeybohm updated the task description for T307943: Update Kubernetes clusters to v1.23.
Tue, Nov 15, 9:12 AM · Foundational Technology Requests, Shared-Data-Infrastructure, Patch-For-Review, Kubernetes, Prod-Kubernetes, serviceops

Mon, Nov 14

JMeybohm moved T322453: Buildkit erroring with "cannot reuse body, request must be retried" upon multi-platform push from Doing 😎 to Stalled 🐌 on the serviceops board.
Mon, Nov 14, 2:56 PM · Release Pipeline (Blubber), Release-Engineering-Team (Priority Backlog 📥), serviceops

Fri, Nov 11

JMeybohm moved T264625: Deploy kube-state-metrics from ⎈Kubernetes to 🥋Good First Task on the serviceops board.
Fri, Nov 11, 1:58 PM · serviceops, User-jijiki, Kubernetes
JMeybohm updated the task description for T307943: Update Kubernetes clusters to v1.23.
Fri, Nov 11, 11:16 AM · Foundational Technology Requests, Shared-Data-Infrastructure, Patch-For-Review, Kubernetes, Prod-Kubernetes, serviceops
JMeybohm renamed T322919: Metrics changes with Kubernetes v1.23 from Metric name changes with Kubernetes v1.23 to Metrics changes with Kubernetes v1.23.
Fri, Nov 11, 11:07 AM · Foundational Technology Requests, Shared-Data-Infrastructure, Kubernetes, Prod-Kubernetes, serviceops
JMeybohm created T322919: Metrics changes with Kubernetes v1.23.
Fri, Nov 11, 11:07 AM · Foundational Technology Requests, Shared-Data-Infrastructure, Kubernetes, Prod-Kubernetes, serviceops
JMeybohm updated the task description for T307943: Update Kubernetes clusters to v1.23.
Fri, Nov 11, 11:07 AM · Foundational Technology Requests, Shared-Data-Infrastructure, Patch-For-Review, Kubernetes, Prod-Kubernetes, serviceops
JMeybohm added a comment to T307943: Update Kubernetes clusters to v1.23.

Let's bump to v1.23.14 before migrating? That version fixes two new security issues:

CVE-2022-3294: Node address isn't always verified when proxying:
https://groups.google.com/g/kubernetes-announce/c/eR0ghAXy2H8

CVE-2022-3162: Unauthorized read of Custom Resources:
https://groups.google.com/g/kubernetes-announce/c/oR2PUBiODNA

Fri, Nov 11, 9:26 AM · Foundational Technology Requests, Shared-Data-Infrastructure, Patch-For-Review, Kubernetes, Prod-Kubernetes, serviceops
JMeybohm closed T313473: Switch to cgroup v2 and systemd as cgroup driver for docker and kubelet, a subtask of T307943: Update Kubernetes clusters to v1.23, as Resolved.
Fri, Nov 11, 9:25 AM · Foundational Technology Requests, Shared-Data-Infrastructure, Patch-For-Review, Kubernetes, Prod-Kubernetes, serviceops
JMeybohm closed T313473: Switch to cgroup v2 and systemd as cgroup driver for docker and kubelet as Resolved.
Fri, Nov 11, 9:25 AM · Kubernetes, Prod-Kubernetes, serviceops

Thu, Nov 10

JMeybohm added a comment to T277876: Reserve resources for system daemons on kubernetes nodes.

This might be helpful to calculate initial values: https://cloud.google.com/kubernetes-engine/docs/concepts/cluster-architecture#memory_cpu

Thu, Nov 10, 1:55 PM · serviceops, Kubernetes, Prod-Kubernetes
JMeybohm added a subtask for T307943: Update Kubernetes clusters to v1.23: T277876: Reserve resources for system daemons on kubernetes nodes.
Thu, Nov 10, 1:26 PM · Foundational Technology Requests, Shared-Data-Infrastructure, Patch-For-Review, Kubernetes, Prod-Kubernetes, serviceops
JMeybohm added a parent task for T277876: Reserve resources for system daemons on kubernetes nodes: T307943: Update Kubernetes clusters to v1.23.
Thu, Nov 10, 1:26 PM · serviceops, Kubernetes, Prod-Kubernetes
JMeybohm claimed T313473: Switch to cgroup v2 and systemd as cgroup driver for docker and kubelet.
Thu, Nov 10, 1:26 PM · Kubernetes, Prod-Kubernetes, serviceops
JMeybohm triaged T310618: Define priorityClassName for istio and cert-manager deployments as Low priority.
Thu, Nov 10, 10:22 AM · Kubernetes, Prod-Kubernetes, serviceops
JMeybohm closed T310486: Update cfssl-issuer to cert-manager 1.8.x, a subtask of T307943: Update Kubernetes clusters to v1.23, as Resolved.
Thu, Nov 10, 10:22 AM · Foundational Technology Requests, Shared-Data-Infrastructure, Patch-For-Review, Kubernetes, Prod-Kubernetes, serviceops
JMeybohm closed T310486: Update cfssl-issuer to cert-manager 1.8.x as Resolved.
Thu, Nov 10, 10:21 AM · Patch-For-Review, Infrastructure-Foundations, CFSSL-PKI, Kubernetes, Prod-Kubernetes, serviceops
JMeybohm added a comment to T303279: Fix calico, cfssl-issuer and knative-serving Helm dependencies.

Fixed for cfss-issuer in chart version 0.3.0

Thu, Nov 10, 10:21 AM · Machine-Learning-Team, serviceops
JMeybohm closed T278329: Support multiple kubernetes versions with puppet, a subtask of T307943: Update Kubernetes clusters to v1.23, as Resolved.
Thu, Nov 10, 10:18 AM · Foundational Technology Requests, Shared-Data-Infrastructure, Patch-For-Review, Kubernetes, Prod-Kubernetes, serviceops
JMeybohm closed T278329: Support multiple kubernetes versions with puppet as Resolved.

We not have exact major.minor version as an enum type puppet and with dedicated apt components

Thu, Nov 10, 10:18 AM · Kubernetes, Prod-Kubernetes, serviceops
JMeybohm added a comment to T322453: Buildkit erroring with "cannot reuse body, request must be retried" upon multi-platform push.

@JMeybohm can you provide the nginx access log entries from that time period as well? I'm trying to rule out auth failure as a factor and docker-registry log entries do not include the subrequests between nginx and jwt-authorizer.

Thu, Nov 10, 10:15 AM · Release Pipeline (Blubber), Release-Engineering-Team (Priority Backlog 📥), serviceops

Tue, Nov 8

JMeybohm added a subtask for T307943: Update Kubernetes clusters to v1.23: Unknown Object (Task).
Tue, Nov 8, 4:06 PM · Foundational Technology Requests, Shared-Data-Infrastructure, Patch-For-Review, Kubernetes, Prod-Kubernetes, serviceops
JMeybohm updated subscribers of T322453: Buildkit erroring with "cannot reuse body, request must be retried" upon multi-platform push.

I took a quick look and AIUI our registry does support application/vnd.docker.distribution.manifest.list.v2+json already. That is really not a new thing and we are not that far behind upstream releases.

Tue, Nov 8, 9:58 AM · Release Pipeline (Blubber), Release-Engineering-Team (Priority Backlog 📥), serviceops
JMeybohm added a comment to T322579: give releng access to logs to debug buildkit-to-wmf-registry publishing.

[...]
Also: both the registry and nginx keep access logs, so I guess it's enough to export one of the two.

Tue, Nov 8, 8:41 AM · serviceops-radar, Release-Engineering-Team (Radar), serviceops-collab
JMeybohm added a comment to T321201: Deploy new mw-debug service.

[...]
@JMeybohm @Joe @akosiaris If this seems like the right way, I will start writing the "Kubernetes/Remove_a_service" wikitech doc.

Tue, Nov 8, 8:20 AM · Patch-For-Review, serviceops, MW-on-K8s

Mon, Nov 7

JMeybohm added a comment to T322453: Buildkit erroring with "cannot reuse body, request must be retried" upon multi-platform push.

I think this might require an update of our docker-registry (which we have not planned for currently). Is this blocking something apart from testing with multi arch images (T272500)?

Mon, Nov 7, 4:50 PM · Release Pipeline (Blubber), Release-Engineering-Team (Priority Backlog 📥), serviceops
JMeybohm updated the task description for T307943: Update Kubernetes clusters to v1.23.
Mon, Nov 7, 11:02 AM · Foundational Technology Requests, Shared-Data-Infrastructure, Patch-For-Review, Kubernetes, Prod-Kubernetes, serviceops

Fri, Nov 4

JMeybohm added a comment to P38118 (An Untitled Masterwork).
diff --git a/modules/pontoon/manifests/sd.pp b/modules/pontoon/manifests/sd.pp
index 2f8632543d..a5a33999cd 100644
--- a/modules/pontoon/manifests/sd.pp
+++ b/modules/pontoon/manifests/sd.pp
@@ -56,4 +56,13 @@ class pontoon::sd (
         command     => '/usr/sbin/dnsmasq --test && /bin/systemctl restart dnsmasq',
         refreshonly => true,
     }
+
+    $k8s_etcd_hosts = pontoon::hosts_for_role('etcd::v3::kubernetes')
+    $etcd_server_records = $k8s_etcd_hosts.map |$i, $_fqdn| { "_etcd-server-ssl._tcp.k8s3.${::domain},${_fqdn},2380,${i}" }
+    $srv_hosts = $etcd_server_records
+
+    file { '/etc/dnsmasq.d/pontoon-srv.conf':
+        content => inline_template('<%= @srv_hosts.map{|x| "srv-host=#{x}" }.join("\n") %>'),
+        notify  => Exec['dnsmasq-restart'], # reload is not enough to pick up new files...
+    }
 }
diff --git a/modules/pontoon/manifests/service_certs.pp b/modules/pontoon/manifests/service_certs.pp
index 242232cca5..658369bbc2 100644
--- a/modules/pontoon/manifests/service_certs.pp
+++ b/modules/pontoon/manifests/service_certs.pp
@@ -45,11 +45,20 @@ class pontoon::service_certs (
         )
     }
Fri, Nov 4, 2:08 PM
JMeybohm committed rLPRIfd7c10b318e8: Rename ml_k8s staging roles to match naming scheme (authored by JMeybohm).
Rename ml_k8s staging roles to match naming scheme
Fri, Nov 4, 9:38 AM
JMeybohm closed T300499: Migrate from command line flags to config files for kubernetes components, a subtask of T307943: Update Kubernetes clusters to v1.23, as Resolved.
Fri, Nov 4, 9:23 AM · Foundational Technology Requests, Shared-Data-Infrastructure, Patch-For-Review, Kubernetes, Prod-Kubernetes, serviceops
JMeybohm closed T300499: Migrate from command line flags to config files for kubernetes components as Resolved.
Fri, Nov 4, 9:22 AM · Kubernetes, Prod-Kubernetes, serviceops
JMeybohm closed T270271: Target Sources (component/kubernetes-future/source/Sources) is configured multiple times, a subtask of T307943: Update Kubernetes clusters to v1.23, as Resolved.
Fri, Nov 4, 8:12 AM · Foundational Technology Requests, Shared-Data-Infrastructure, Patch-For-Review, Kubernetes, Prod-Kubernetes, serviceops
JMeybohm closed T270271: Target Sources (component/kubernetes-future/source/Sources) is configured multiple times as Resolved.

Thanks @jbond !

Fri, Nov 4, 8:12 AM · Kubernetes, Prod-Kubernetes, serviceops

Thu, Nov 3

JMeybohm updated the task description for T300499: Migrate from command line flags to config files for kubernetes components.
Thu, Nov 3, 3:26 PM · Kubernetes, Prod-Kubernetes, serviceops
JMeybohm updated the task description for T300499: Migrate from command line flags to config files for kubernetes components.
Thu, Nov 3, 1:17 PM · Kubernetes, Prod-Kubernetes, serviceops
JMeybohm updated the task description for T300499: Migrate from command line flags to config files for kubernetes components.
Thu, Nov 3, 1:08 PM · Kubernetes, Prod-Kubernetes, serviceops

Wed, Nov 2

JMeybohm updated the task description for T300499: Migrate from command line flags to config files for kubernetes components.
Wed, Nov 2, 10:34 AM · Kubernetes, Prod-Kubernetes, serviceops
JMeybohm updated the task description for T300499: Migrate from command line flags to config files for kubernetes components.
Wed, Nov 2, 10:32 AM · Kubernetes, Prod-Kubernetes, serviceops

Tue, Nov 1

JMeybohm claimed T300499: Migrate from command line flags to config files for kubernetes components.
Tue, Nov 1, 3:50 PM · Kubernetes, Prod-Kubernetes, serviceops
JMeybohm moved T300499: Migrate from command line flags to config files for kubernetes components from 🙈🙉🙊Backlog to Doing 😎 on the serviceops board.
Tue, Nov 1, 3:49 PM · Kubernetes, Prod-Kubernetes, serviceops
JMeybohm added a comment to T320812: [SPIKE] Deploy event driven stateless Flink service to DSE cluster.

IIRC the application deployment cluster we ditched because of missing HA capabilities. Option 1. we might have considered as well but found that it was not easy to "get right" in an automated way (without adding quite some complexity). Especially in cases where a job has to resume from a tombstone/safepoint. But ofc. as long as the "job artifact/the jar file" is bundled in a container, which is deployed via helm charts, it makes it easy to recreate this exact state in the cluster (without knowing anything about flink).

Tue, Nov 1, 9:39 AM · Event-Platform Value Stream (Sprint 04), Shared-Data-Infrastructure, Data-Engineering-Planning

Fri, Oct 28

JMeybohm closed T303184: High API server request latencies (LIST) as Resolved.

I'm going to resolve this as we're updating all of the said clients with T307943: Update Kubernetes clusters to v1.23

Fri, Oct 28, 11:10 AM · Prod-Kubernetes, Kubernetes, serviceops
JMeybohm closed T303184: High API server request latencies (LIST) , a subtask of T290966: Implement POC for istio ingress, as Resolved.
Fri, Oct 28, 11:10 AM · Patch-For-Review, Prod-Kubernetes, Kubernetes, serviceops
JMeybohm closed T303184: High API server request latencies (LIST) , a subtask of T307943: Update Kubernetes clusters to v1.23, as Resolved.
Fri, Oct 28, 11:10 AM · Foundational Technology Requests, Shared-Data-Infrastructure, Patch-For-Review, Kubernetes, Prod-Kubernetes, serviceops

Oct 26 2022

JMeybohm added a comment to T321657: Custom flavor for SRE projects.

make better use of the quota in the projects

Not sure what you mean by this - wouldn't you still get hit by the CPU quota anyways? As far as I can tell CPU is currently a much more limited resource than what RAM is, so this might not be worth the extra complexity.

Oct 26 2022, 1:44 PM · Cloud-VPS (Quota-requests)
JMeybohm created T321657: Custom flavor for SRE projects.
Oct 26 2022, 9:41 AM · Cloud-VPS (Quota-requests)

Oct 24 2022

JMeybohm updated the task description for T321491: Evaluate Flink Operator on DSE Kubernetes Cluster.
Oct 24 2022, 2:23 PM · serviceops-radar, Discovery-Search (Current work)

Oct 20 2022

JMeybohm closed T300500: Kubernetes services with externalTrafficPolicy: Local don't work as Resolved.

Now it has

Oct 20 2022, 2:22 PM · Patch-For-Review, serviceops, Prod-Kubernetes, Kubernetes

Oct 19 2022

JMeybohm lowered the priority of T215809: Set up a local redis proxy since docker-registry can only connect to one redis instance for caching from Medium to Low.
Oct 19 2022, 12:56 PM · User-fsero, serviceops, Prod-Kubernetes, Kubernetes, SRE

Oct 17 2022

JMeybohm moved T256762: Fix nginx config and caching for docker registry from 🔦Unused2 to 🍦IceBox on the serviceops board.
Oct 17 2022, 2:25 PM · Patch-For-Review, serviceops, Kubernetes, SRE

Oct 12 2022

JMeybohm reopened T300500: Kubernetes services with externalTrafficPolicy: Local don't work as "Open".

ouch, this never made it to prod

Oct 12 2022, 2:10 PM · Patch-For-Review, serviceops, Prod-Kubernetes, Kubernetes

Oct 11 2022

JMeybohm committed rLPRIceb6f8b09807: Keep k8s tokens identifiable as dummys (authored by JMeybohm).
Keep k8s tokens identifiable as dummys
Oct 11 2022, 2:44 PM
JMeybohm committed rLPRIfebcb1c5f2b6: Randomize tokens in profile::kubernetes::infrastructure_users (authored by JMeybohm).
Randomize tokens in profile::kubernetes::infrastructure_users
Oct 11 2022, 2:06 PM

Oct 10 2022

JMeybohm updated the task description for T307943: Update Kubernetes clusters to v1.23.
Oct 10 2022, 4:30 PM · Foundational Technology Requests, Shared-Data-Infrastructure, Patch-For-Review, Kubernetes, Prod-Kubernetes, serviceops
JMeybohm moved T270191: Add kubernetes 1.17+ topology annotations from 🙈🙉🙊Backlog to ⎈Kubernetes on the serviceops board.
Oct 10 2022, 8:15 AM · Patch-For-Review, Kubernetes, Prod-Kubernetes, serviceops
JMeybohm updated the task description for T307943: Update Kubernetes clusters to v1.23.
Oct 10 2022, 7:52 AM · Foundational Technology Requests, Shared-Data-Infrastructure, Patch-For-Review, Kubernetes, Prod-Kubernetes, serviceops
JMeybohm moved T302717: Use ingress for linkrecommendation from 🙈🙉🙊Backlog to 🥋Good First Task on the serviceops board.
Oct 10 2022, 7:34 AM · Add-Link, Prod-Kubernetes, Kubernetes, serviceops, Growth-Team
JMeybohm moved T303184: High API server request latencies (LIST) from 🙈🙉🙊Backlog to ⎈Kubernetes on the serviceops board.
Oct 10 2022, 7:33 AM · Prod-Kubernetes, Kubernetes, serviceops
JMeybohm moved T299236: Move away from system:node RBAC role from 🙈🙉🙊Backlog to ⎈Kubernetes on the serviceops board.
Oct 10 2022, 7:28 AM · serviceops, Prod-Kubernetes, Kubernetes
JMeybohm moved T310618: Define priorityClassName for istio and cert-manager deployments from 🙈🙉🙊Backlog to ⎈Kubernetes on the serviceops board.
Oct 10 2022, 7:27 AM · Kubernetes, Prod-Kubernetes, serviceops
JMeybohm moved T316348: Remove kubeyaml from deployment-charts CI from 🙈🙉🙊Backlog to 🥋Good First Task on the serviceops board.
Oct 10 2022, 7:26 AM · Kubernetes, serviceops
JMeybohm triaged T249929: Integrate kube-metrics-server into our infrastructure as Low priority.
Oct 10 2022, 7:24 AM · Kubernetes, serviceops

Oct 5 2022

JMeybohm moved T233196: Migrate thumbor to Kubernetes from 🔦Unused2 to Doing 😎 on the serviceops board.
Oct 5 2022, 12:34 PM · Platform Team Workboards (Platform Engineering Reliability), Patch-For-Review, Thumbor Migration, SRE, serviceops, Thumbor

Oct 4 2022

JMeybohm updated the task description for T307943: Update Kubernetes clusters to v1.23.
Oct 4 2022, 2:01 PM · Foundational Technology Requests, Shared-Data-Infrastructure, Patch-For-Review, Kubernetes, Prod-Kubernetes, serviceops
JMeybohm updated the task description for T307943: Update Kubernetes clusters to v1.23.
Oct 4 2022, 1:26 PM · Foundational Technology Requests, Shared-Data-Infrastructure, Patch-For-Review, Kubernetes, Prod-Kubernetes, serviceops
JMeybohm updated the task description for T307943: Update Kubernetes clusters to v1.23.
Oct 4 2022, 11:08 AM · Foundational Technology Requests, Shared-Data-Infrastructure, Patch-For-Review, Kubernetes, Prod-Kubernetes, serviceops
JMeybohm moved T310721: eventstreams chart should use latest common_templates from 🙈🙉🙊Backlog to 🥋Good First Task on the serviceops board.

This needs SRE support to depool eventstreams from one DC. helmfile destroy/helmfile appy can be be done by deployers as well.

Oct 4 2022, 9:35 AM · Event-Platform Value Stream (Sprint 02), Patch-For-Review, Data-Engineering, SRE, serviceops
JMeybohm updated the task description for T307943: Update Kubernetes clusters to v1.23.
Oct 4 2022, 8:43 AM · Foundational Technology Requests, Shared-Data-Infrastructure, Patch-For-Review, Kubernetes, Prod-Kubernetes, serviceops
JMeybohm added a comment to T303279: Fix calico, cfssl-issuer and knative-serving Helm dependencies.

Fixed for calico with v3.23.3

Oct 4 2022, 8:38 AM · Machine-Learning-Team, serviceops
JMeybohm moved T310486: Update cfssl-issuer to cert-manager 1.8.x from 🙈🙉🙊Backlog to Doing 😎 on the serviceops board.
Oct 4 2022, 8:19 AM · Patch-For-Review, Infrastructure-Foundations, CFSSL-PKI, Kubernetes, Prod-Kubernetes, serviceops
JMeybohm claimed T307943: Update Kubernetes clusters to v1.23.
Oct 4 2022, 8:19 AM · Foundational Technology Requests, Shared-Data-Infrastructure, Patch-For-Review, Kubernetes, Prod-Kubernetes, serviceops

Sep 30 2022

JMeybohm added a project to T318707: Don't scrape every containerPort for metrics: Machine-Learning-Team.
Sep 30 2022, 8:59 AM · Machine-Learning-Team, Kubernetes, Observability-Metrics, serviceops
JMeybohm added a comment to T318707: Don't scrape every containerPort for metrics.

Speaking from a position of almost total ignorance:

Do we only care about the pods spawned by the helm charts in the deployment-charts repo ? Just trying to figure out which clusters/pods are affected.

Basically yes. But ultimately this will affect all pods in all clusters - and we should try to keep backwards compatibility as far as possible.

Sep 30 2022, 8:57 AM · Machine-Learning-Team, Kubernetes, Observability-Metrics, serviceops

Sep 28 2022

JMeybohm added a comment to T318705: Limit the envoy metrics scraped from k8s.

How can we tell if this is working?

Use an envoy metric that doesn't match the regex, such as

rate(envoy_cluster_default_total_match_count{app="api-gateway", envoy_cluster_name=~".*"}[5m])

After the change is merged, this should not return any hits.

Sep 28 2022, 4:18 PM · Kubernetes, Observability-Metrics, serviceops
JMeybohm updated the task description for T318707: Don't scrape every containerPort for metrics.
Sep 28 2022, 8:42 AM · Machine-Learning-Team, Kubernetes, Observability-Metrics, serviceops
JMeybohm updated subscribers of T318705: Limit the envoy metrics scraped from k8s.
Sep 28 2022, 7:28 AM · Kubernetes, Observability-Metrics, serviceops

Sep 27 2022

JMeybohm closed T311251: Migrate kubernetes alerts away from icinga as Resolved.

Changed the alerts from using p99 to using p95, resolving this again.

Sep 27 2022, 3:05 PM · Patch-For-Review, Observability-Alerting, Kubernetes, Prod-Kubernetes, serviceops
JMeybohm closed T311251: Migrate kubernetes alerts away from icinga , a subtask of T307943: Update Kubernetes clusters to v1.23, as Resolved.
Sep 27 2022, 3:05 PM · Foundational Technology Requests, Shared-Data-Infrastructure, Patch-For-Review, Kubernetes, Prod-Kubernetes, serviceops
JMeybohm closed T311251: Migrate kubernetes alerts away from icinga , a subtask of T288622: All Prometheus based alerts move from Icinga to alert manager exclusively, as Resolved.
Sep 27 2022, 3:05 PM · SRE Observability (FY2022/2023-Q2)