Page MenuHomePhabricator

JMeybohm
User

Projects (7)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Tuesday

  • Clear sailing ahead.

User Details

User Since
Apr 2 2020, 9:01 AM (58 w, 3 d)
Availability
Available
IRC Nick
jayme
LDAP User
Unknown
MediaWiki User
JMeybohm (WMF) [ Global Accounts ]

Recent Activity

Wed, May 12

JMeybohm created P15954 docker-pulltime.py.
Wed, May 12, 9:59 PM
JMeybohm closed T277877: Set resource requests and limits for calico PODs as Resolved.

Calico components are running with resource definitions in all clusters now.

Wed, May 12, 7:44 AM · Prod-Kubernetes, serviceops, Kubernetes, SRE
JMeybohm closed T277877: Set resource requests and limits for calico PODs, a subtask of T207804: Upgrade Calico, as Resolved.
Wed, May 12, 7:44 AM · Patch-Needs-Improvement, Prod-Kubernetes, User-fsero, serviceops, Kubernetes, SRE

Tue, May 11

JMeybohm closed T270063: kube-apiserver flag --admission-control has been deprecated as Resolved.

Merged and deployed.

Tue, May 11, 10:04 AM · Kubernetes, Prod-Kubernetes, serviceops
JMeybohm closed T270063: kube-apiserver flag --admission-control has been deprecated, a subtask of T244335: Upgrade kubernetes clusters to a security supported (LTS) version, as Resolved.
Tue, May 11, 10:04 AM · Patch-For-Review, Kubernetes, Prod-Kubernetes, serviceops

Mon, May 10

JMeybohm added a comment to T277877: Set resource requests and limits for calico PODs.

I've looked into typha and kube-controllers component as well as they shot a similar patterns (different magnitude, though).
Unfortunately we lack prometheus metrics for kube-controlles (as they are not available in calico 3.17). Typhas throttling seems mostly related to Go GC and a bunch of go routines pinging each other regularly. I'd assume the same for kube-controllers but I can't be sure currently.

Mon, May 10, 2:31 PM · Prod-Kubernetes, serviceops, Kubernetes, SRE
JMeybohm edited P15857 plot-docker-stats.py.
Mon, May 10, 2:25 PM

Fri, May 7

JMeybohm updated subscribers of T277877: Set resource requests and limits for calico PODs.

I tried to verify the above assumption by collecting metrics more frequently (per second) from the docker API (see P15857). This paints a more clear picture of that happens:

Fri, May 7, 1:14 PM · Prod-Kubernetes, serviceops, Kubernetes, SRE
JMeybohm created P15857 plot-docker-stats.py.
Fri, May 7, 1:05 PM

Thu, May 6

Dzahn awarded T277740: Support downtiming services in our cookbooks a Love token.
Thu, May 6, 6:39 PM · Prod-Kubernetes, SRE, SRE-tools

Tue, May 4

Dzahn awarded T271573: upgrade conf2* servers to stretch a Barnstar token.
Tue, May 4, 12:37 AM · Patch-For-Review, SRE, serviceops

Mon, May 3

JMeybohm reassigned T281374: decommission conf200[1-3].codfw.wmnet from JMeybohm to Papaul.
Mon, May 3, 10:19 AM · SRE, ops-codfw, DC-Ops, serviceops, decommission-hardware
jcrespo awarded T271573: upgrade conf2* servers to stretch a Like token.
Mon, May 3, 10:14 AM · Patch-For-Review, SRE, serviceops

Fri, Apr 30

JMeybohm added a comment to T275637: (Need By: TBD) rack/setup/install conf200[456].codfw.wmnet.

@Joe please remember to change the server status in Netbox to "Active" once the server is in service.

https://netbox.wikimedia.org/extras/reports/results/659549/

Thanks

Fri, Apr 30, 2:55 PM · serviceops, SRE, ops-codfw, DC-Ops

Thu, Apr 29

JMeybohm claimed T277677: Write a cookbook to set a k8s cluster in maintenance mode.
Thu, Apr 29, 2:38 PM · SRE-tools, SRE, Prod-Kubernetes, serviceops
JMeybohm closed T271573: upgrade conf2* servers to stretch as Resolved.
Thu, Apr 29, 8:26 AM · Patch-For-Review, SRE, serviceops
JMeybohm added a subtask for T275600: Support proxying to etcd v3 storage on buster or later: T281447: Support etcd v3 backups with ::etcd::backup.
Thu, Apr 29, 8:25 AM · SRE, serviceops
JMeybohm added a parent task for T281447: Support etcd v3 backups with ::etcd::backup: T275600: Support proxying to etcd v3 storage on buster or later.
Thu, Apr 29, 8:25 AM · SRE, serviceops
JMeybohm triaged T281447: Support etcd v3 backups with ::etcd::backup as Medium priority.
Thu, Apr 29, 8:24 AM · SRE, serviceops
JMeybohm updated the task description for T281374: decommission conf200[1-3].codfw.wmnet.
Thu, Apr 29, 8:23 AM · SRE, ops-codfw, DC-Ops, serviceops, decommission-hardware
JMeybohm added projects to T281374: decommission conf200[1-3].codfw.wmnet: DC-Ops, ops-codfw.
  • COMMON_STEPS (FAIL)
    • Failed to run the sre.dns.netbox cookbook: Cumin execution failed (exit_code=2)

ERROR: some step on some host failed, check the bolded items above

Thu, Apr 29, 8:23 AM · SRE, ops-codfw, DC-Ops, serviceops, decommission-hardware
jcrespo awarded T281447: Support etcd v3 backups with ::etcd::backup a Like token.
Thu, Apr 29, 8:17 AM · SRE, serviceops
JMeybohm created T281447: Support etcd v3 backups with ::etcd::backup.
Thu, Apr 29, 8:14 AM · SRE, serviceops

Wed, Apr 28

JMeybohm updated the task description for T281374: decommission conf200[1-3].codfw.wmnet.
Wed, Apr 28, 3:41 PM · SRE, ops-codfw, DC-Ops, serviceops, decommission-hardware
JMeybohm created P15621 (An Untitled Masterwork).
Wed, Apr 28, 3:14 PM
JMeybohm added a subtask for T271573: upgrade conf2* servers to stretch: T281374: decommission conf200[1-3].codfw.wmnet.
Wed, Apr 28, 2:42 PM · Patch-For-Review, SRE, serviceops
JMeybohm added a parent task for T281374: decommission conf200[1-3].codfw.wmnet: T271573: upgrade conf2* servers to stretch.
Wed, Apr 28, 2:42 PM · SRE, ops-codfw, DC-Ops, serviceops, decommission-hardware
JMeybohm created T281374: decommission conf200[1-3].codfw.wmnet.
Wed, Apr 28, 2:41 PM · SRE, ops-codfw, DC-Ops, serviceops, decommission-hardware
JMeybohm added a comment to T271573: upgrade conf2* servers to stretch.

DNS SRV records, pybal's and confd instances in codfw, eqsin, ulsfo moved to the new cluster. navtiming.service on webperf needed a restart as well.

Wed, Apr 28, 1:52 PM · Patch-For-Review, SRE, serviceops

Tue, Apr 27

JMeybohm added a comment to T271573: upgrade conf2* servers to stretch.

Zookeeper has completely moved from conf200[1-3] to conf200[4-6], kafka-main, mirror-maker and kafka-logging in codfw have been restarted to catch up with that as well.

Tue, Apr 27, 3:30 PM · Patch-For-Review, SRE, serviceops

Mon, Apr 26

JMeybohm added a comment to T271573: upgrade conf2* servers to stretch.

Switched zookeeper from conf2001 to conf2004.
We decided to leave it like this for today and see if anything comes up.

Mon, Apr 26, 2:15 PM · Patch-For-Review, SRE, serviceops
JMeybohm committed rLPRId9cc3fb9a258: Add new tlsproxy cert for configcluster etcd (authored by JMeybohm).
Add new tlsproxy cert for configcluster etcd
Mon, Apr 26, 7:23 AM

Fri, Apr 23

JMeybohm added a comment to T238909: Proposal: simplify set up of a new load-balanced service on kubernetes.

During testing today, we had some sideline issues because calico-node was dying (as we brought down the network interface on one k8s node for testing). This led to two action items/nice to haves:

  • Mark nodes at not NotReady when critical Daemonsets are not ready (like calico-node)
    • Unfortunately this is nothing we can do with builtin methods. The discussions about this (Node Readiness Gates) seem to always end with the recommendation to start all nodes with a specific taint that is then removed by the mandatory daemonset or a additional controller (like https://github.com/wish/nodetaint)
  • Run calico-node without exponential backoff on Crashloop. This removes potential long wait times until a node comes back up when calico-node has failed a couple of times (e.g. the exponential backoff time is quite high)
Fri, Apr 23, 1:50 PM · SRE, Prod-Kubernetes, Pybal, Traffic, serviceops
JMeybohm added a comment to T238909: Proposal: simplify set up of a new load-balanced service on kubernetes.

Very cool!

Fri, Apr 23, 7:42 AM · SRE, Prod-Kubernetes, Pybal, Traffic, serviceops

Apr 16 2021

JMeybohm added a comment to T271573: upgrade conf2* servers to stretch.

The tlsproxy currently serves a certificate not valid for conf200[4,5,6] (Prometheus errors with: Get https://conf2004:4001/metrics: x509: certificate is valid for conf2001.codfw.wmnet, conf2002.codfw.wmnet, conf2003.codfw.wmnet, conf2001, conf2002, conf2003, etcd.codfw.wmnet, not conf2004)

Apr 16 2021, 1:56 PM · Patch-For-Review, SRE, serviceops
JMeybohm added a comment to T271573: upgrade conf2* servers to stretch.

Do you think, with the work done, we could drop support of jessie bacula backups (only etcd cluster was pending with jessie)?

Apr 16 2021, 9:16 AM · Patch-For-Review, SRE, serviceops
JMeybohm claimed T271573: upgrade conf2* servers to stretch.

etcd cluster is set up now on conf200[4,5,6] although I had some trouble setting it up and I do not yet know why:

Apr 16 2021, 9:05 AM · Patch-For-Review, SRE, serviceops

Apr 15 2021

JMeybohm committed rLPRI249ea28e2e35: htpasswd(): salt must be 8 characters (authored by JMeybohm).
htpasswd(): salt must be 8 characters
Apr 15 2021, 10:39 AM
JMeybohm committed rLPRI0b84184cb07d: Add key for _etcd-server-ssl._tcp.v3.codfw.wmnet.key (authored by JMeybohm).
Add key for _etcd-server-ssl._tcp.v3.codfw.wmnet.key
Apr 15 2021, 10:39 AM

Apr 14 2021

JMeybohm updated subscribers of T277877: Set resource requests and limits for calico PODs.

This is not exactly looking great on the staging clusters as we can see heavy throttling. The current assumption is that this is caused by the very spiky nature of the work done by the processes here and that we don't see that properly reflected in the prometheus metrics (as the scrape interval is 60s, it's likely that we "miss" the spikes).

Apr 14 2021, 11:39 AM · Prod-Kubernetes, serviceops, Kubernetes, SRE
JMeybohm triaged T280125: Migrate default nework policies (default-network-policy-conf.yaml) to GlobalNetworkPolicies as Low priority.
Apr 14 2021, 11:34 AM · Prod-Kubernetes, serviceops, Kubernetes, SRE
JMeybohm created T280125: Migrate default nework policies (default-network-policy-conf.yaml) to GlobalNetworkPolicies.
Apr 14 2021, 11:33 AM · Prod-Kubernetes, serviceops, Kubernetes, SRE
JMeybohm added a comment to T273098: High Availability Flink.

I do see that using the configmap election method is appealing as it is build in and does not require additional software to function. Unfortunately I was not able to understand (by briefly reading the docs) if this uses a separate configmap or the one that is actually used for configuring flink.
While the former would be okay-ish I guess, the latter will potentially cause problems as every deployment will result in a re-creation of said configmap by helm. Resetting it to whatever state the chart has defined.
Apart from potentially losing data in that case I'm not 100% certain that helm will handle that properly in every case as I have seen to many weird issues with helm and "manually" altered kubernetes objects.

Apr 14 2021, 10:06 AM · Patch-For-Review, Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
JMeybohm added a comment to T280076: Stop / remove linkrecommendation-production-load-datasets-1618311600-hn6k8.

What I think what happened here is:

  • The CronJob (with the old image set in spec) created a Job "linkrecommendation-production-load-datasets-1618311600"
  • That Job created a Pod (linkrecommendation-production-load-datasets-1618311600-hn6k8) - old image, ofc.
  • That Pod was deleted by @Dzahn
  • The Job (watching over the Pod) could not find the Pod and sheduled a new one
  • The CronJob otoh would not schedule a new Job, as concurrency is 1 and there already was an "active" Job.
Apr 14 2021, 9:22 AM · Growth-Team, Add-Link, serviceops

Apr 13 2021

JMeybohm added a comment to P15279 (An Untitled Masterwork).
== Initializing ==
Traceback (most recent call last):
  File "load-datasets.py", line 370, in <module>
    main()
  File "load-datasets.py", line 170, in main
    dataset_name_for_table="checksum", connection=mysql_connection
  File "load-datasets.py", line 129, in ensure_table_exists
    create_tables(raw_args=table_args, mysql_connection=connection)
  File "/srv/app/create_tables.py", line 46, in create_tables
    cursor.execute(database_utf_alter_query)
  File "/opt/lib/python/site-packages/MySQLdb/cursors.py", line 206, in execute                                                                          
    res = self._query(query)
  File "/opt/lib/python/site-packages/MySQLdb/cursors.py", line 319, in _query                                                                           
    db.query(q)
  File "/opt/lib/python/site-packages/MySQLdb/connections.py", line 259, in query                                                                        
    _mysql.connection.query(self, query)
MySQLdb._exceptions.OperationalError: (1044, "Access denied for user 'adminlinkrecommendation'@'10.64.0.135' to database 'mwaddlink'")                   
   [general] Ensuring checksum table exists...
Apr 13 2021, 10:37 AM

Apr 8 2021

JMeybohm claimed T270063: kube-apiserver flag --admission-control has been deprecated.

We should take the chance and refactor this a bit.
According to kube-apiserver -h we don't need to list the default admission controllers via --enable-admission-plugins anymore and, even worse, they won't get disabled when left out. From the help output:

Apr 8 2021, 3:06 PM · Kubernetes, Prod-Kubernetes, serviceops
JMeybohm claimed T277877: Set resource requests and limits for calico PODs.

Added some defaults based on the current maximum values (https://grafana-rw.wikimedia.org/d/2AfU0X_Mz/jayme-calico-resources?orgId=1)

Apr 8 2021, 1:36 PM · Prod-Kubernetes, serviceops, Kubernetes, SRE
JMeybohm triaged T270271: Target Sources (component/kubernetes-future/source/Sources) is configured multiple times as Medium priority.
Apr 8 2021, 11:01 AM · Kubernetes, Prod-Kubernetes, serviceops
JMeybohm added a comment to T270271: Target Sources (component/kubernetes-future/source/Sources) is configured multiple times.

This is probably something we can revisit when we've decided how we deal with multiple k8s versions in the future (T278329)

Apr 8 2021, 11:00 AM · Kubernetes, Prod-Kubernetes, serviceops
JMeybohm added a subtask for T278329: Support multiple kubernetes versions with puppet: T270271: Target Sources (component/kubernetes-future/source/Sources) is configured multiple times.
Apr 8 2021, 11:00 AM · Kubernetes, Prod-Kubernetes, serviceops
JMeybohm added a parent task for T270271: Target Sources (component/kubernetes-future/source/Sources) is configured multiple times: T278329: Support multiple kubernetes versions with puppet.
Apr 8 2021, 11:00 AM · Kubernetes, Prod-Kubernetes, serviceops
JMeybohm closed T269461: k8s_infrastructure_users: rsyslog and echostore share the same id as Resolved.

Users and template migrated to use the username as user ID and a YAML list of groups instead of "type".

Apr 8 2021, 10:59 AM · Patch-For-Review, Kubernetes, Prod-Kubernetes, serviceops
JMeybohm closed T269461: k8s_infrastructure_users: rsyslog and echostore share the same id, a subtask of T244335: Upgrade kubernetes clusters to a security supported (LTS) version, as Resolved.
Apr 8 2021, 10:59 AM · Patch-For-Review, Kubernetes, Prod-Kubernetes, serviceops
JMeybohm committed rLPRI5ffecd8094a6: Migrate kubernetes infrastructure_users to new syntax (authored by JMeybohm).
Migrate kubernetes infrastructure_users to new syntax
Apr 8 2021, 9:42 AM
JMeybohm added a comment to T279411: Determine why service responses are slow and what we can do about it.

@akosiaris @JMeybohm wondering if you all have ideas here. Comparing the local run with the profiler output from staging, there are several calls that seem to take significantly longer

Apr 8 2021, 7:39 AM · Patch-For-Review, Growth-Team (Current Sprint), serviceops, Data-Persistence (Consultation), Add-Link

Apr 7 2021

JMeybohm committed rLPRI699f44181f7f: Remove deployment_server_secrets::admin_services (authored by JMeybohm).
Remove deployment_server_secrets::admin_services
Apr 7 2021, 1:27 PM

Apr 6 2021

JMeybohm edited P11638 smaller_gerritbot_comments.js.
Apr 6 2021, 12:23 PM · JavaScript, Phabricator
JMeybohm removed a project from T279042: Reprepro: Refresh the kubernetes repo key once they refresh upstream: serviceops.

Removing serviceops as we don't use packages/releases signed with that key

Apr 6 2021, 9:39 AM · Patch-For-Review, cloud-services-team (Kanban)
JMeybohm closed T228967: Set up PodSecurityPolicies in clusters, a subtask of T212123: Kubernetes clusters roadmap, as Resolved.
Apr 6 2021, 9:26 AM · User-fsero, serviceops, Prod-Kubernetes
JMeybohm closed T228967: Set up PodSecurityPolicies in clusters as Resolved.

But wait, it's currently still not fully active and blocked by: T274262

This can be closed when https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/667986 has been reverted.

Apr 6 2021, 9:26 AM · Patch-For-Review, User-fsero, serviceops, Prod-Kubernetes
JMeybohm closed T228967: Set up PodSecurityPolicies in clusters, a subtask of T244335: Upgrade kubernetes clusters to a security supported (LTS) version, as Resolved.
Apr 6 2021, 9:26 AM · Patch-For-Review, Kubernetes, Prod-Kubernetes, serviceops

Mar 24 2021

JMeybohm created T278356: Update kubernetes-client.
Mar 24 2021, 5:20 PM · Kubernetes, Prod-Kubernetes, serviceops
JMeybohm created P15078 debug-namespace.yaml.
Mar 24 2021, 3:46 PM · Kubernetes
JMeybohm committed rLPRI5d82fe39db63: Add kubemster dummy keys (authored by JMeybohm).
Add kubemster dummy keys
Mar 24 2021, 3:28 PM
JMeybohm created T278329: Support multiple kubernetes versions with puppet.
Mar 24 2021, 2:22 PM · Kubernetes, Prod-Kubernetes, serviceops
JMeybohm committed rLPRI0f9c345749f3: Migrate two k8s users to groups syntax (authored by JMeybohm).
Migrate two k8s users to groups syntax
Mar 24 2021, 1:52 PM
JMeybohm updated subscribers of T267539: Archive/Remove deprecated calico gerrit repositories.

The first two repos where already read-only with an archived description. I've done so for the third one as well.

Mar 24 2021, 1:25 PM · Prod-Kubernetes, serviceops, Kubernetes, SRE
JMeybohm closed T277741: Update Kubernetes cluster eqiad to kubernetes 1.16 as Resolved.

It's safe to say we did this and we have tasks for follow ups (mostly from T277191)

Mar 24 2021, 1:08 PM · Patch-For-Review, Kubernetes, Prod-Kubernetes, serviceops
JMeybohm closed T277741: Update Kubernetes cluster eqiad to kubernetes 1.16, a subtask of T244335: Upgrade kubernetes clusters to a security supported (LTS) version, as Resolved.
Mar 24 2021, 1:08 PM · Patch-For-Review, Kubernetes, Prod-Kubernetes, serviceops
elukey awarded T275641: Clean up/Consolidate kubernetes related dashboards a Love token.
Mar 24 2021, 1:01 PM · Kubernetes, Prod-Kubernetes, serviceops
JMeybohm closed Restricted Task, a subtask of T244335: Upgrade kubernetes clusters to a security supported (LTS) version, as Resolved.
Mar 24 2021, 11:01 AM · Patch-For-Review, Kubernetes, Prod-Kubernetes, serviceops
JMeybohm updated the task description for T266216: Increase visibility of container/pod ressource exhaustion .
Mar 24 2021, 10:15 AM · observability, serviceops, Prod-Kubernetes, Kubernetes
JMeybohm updated the task description for T277677: Write a cookbook to set a k8s cluster in maintenance mode.
Mar 24 2021, 9:17 AM · SRE-tools, SRE, Prod-Kubernetes, serviceops
JMeybohm renamed T275641: Clean up/Consolidate kubernetes related dashboards from Investigate/Fix missing metrics from k8s-node and k8s-node-proxy jobs to Clean up/Consolidate kubernetes related dashboards.
Mar 24 2021, 9:13 AM · Kubernetes, Prod-Kubernetes, serviceops

Mar 23 2021

JMeybohm added a comment to T277711: Memcached, mcrouter, nutcracker's future in MediaWiki on Kubernetes.

onhost memcached

It's still an open question how we will inject the node IP into the mcrouter configuration. it would mean we'd need to pass the host IP as an env variable to the mcrouter container and somehow inject it into the configuration. The downside is we wouldn't be able to make use of mcrouter's ability to reload its configuration at runtime.

Mar 23 2021, 5:21 PM · serviceops, SRE
JMeybohm reopened T228967: Set up PodSecurityPolicies in clusters as "Open".

But wait, it's currently still not fully active and blocked by: T274262

Mar 23 2021, 4:19 PM · Patch-For-Review, User-fsero, serviceops, Prod-Kubernetes
JMeybohm reopened T228967: Set up PodSecurityPolicies in clusters, a subtask of T212123: Kubernetes clusters roadmap, as Open.
Mar 23 2021, 4:19 PM · User-fsero, serviceops, Prod-Kubernetes
JMeybohm reopened T228967: Set up PodSecurityPolicies in clusters, a subtask of T244335: Upgrade kubernetes clusters to a security supported (LTS) version, as Open.
Mar 23 2021, 4:19 PM · Patch-For-Review, Kubernetes, Prod-Kubernetes, serviceops
JMeybohm closed T228967: Set up PodSecurityPolicies in clusters, a subtask of T212123: Kubernetes clusters roadmap, as Resolved.
Mar 23 2021, 4:14 PM · User-fsero, serviceops, Prod-Kubernetes
JMeybohm closed T228967: Set up PodSecurityPolicies in clusters as Resolved.

This is active in all clusters now

Mar 23 2021, 4:14 PM · Patch-For-Review, User-fsero, serviceops, Prod-Kubernetes
JMeybohm closed T228967: Set up PodSecurityPolicies in clusters, a subtask of T244335: Upgrade kubernetes clusters to a security supported (LTS) version, as Resolved.
Mar 23 2021, 4:14 PM · Patch-For-Review, Kubernetes, Prod-Kubernetes, serviceops
JMeybohm closed T274852: Refactor users in production-images as Resolved.

Done and rolled out

Mar 23 2021, 4:13 PM · Patch-For-Review, serviceops, Prod-Kubernetes
JMeybohm closed T274852: Refactor users in production-images, a subtask of T274254: Check/Rebuild all docker-pkg build docker images running on kubernetes, as Resolved.
Mar 23 2021, 4:13 PM · Patch-For-Review, serviceops, Prod-Kubernetes
JMeybohm added a subtask for T244335: Upgrade kubernetes clusters to a security supported (LTS) version: T277741: Update Kubernetes cluster eqiad to kubernetes 1.16.
Mar 23 2021, 4:12 PM · Patch-For-Review, Kubernetes, Prod-Kubernetes, serviceops
JMeybohm added a parent task for T277741: Update Kubernetes cluster eqiad to kubernetes 1.16: T244335: Upgrade kubernetes clusters to a security supported (LTS) version.
Mar 23 2021, 4:12 PM · Patch-For-Review, Kubernetes, Prod-Kubernetes, serviceops
JMeybohm closed T262527: Update to kernel 4.19 on kubernetes nodes, a subtask of T244335: Upgrade kubernetes clusters to a security supported (LTS) version, as Resolved.
Mar 23 2021, 4:11 PM · Patch-For-Review, Kubernetes, Prod-Kubernetes, serviceops
JMeybohm closed T262527: Update to kernel 4.19 on kubernetes nodes as Resolved.

All nodes running kernel 4.19 now

Mar 23 2021, 4:11 PM · User-jijiki, serviceops, Prod-Kubernetes, Kubernetes
JMeybohm added a comment to T278220: Define the size of a pod for mediawiki in terms of resource usage.

The goal is to pack 4 or even 5 pods in a single modern node.

Mar 23 2021, 11:11 AM · serviceops, MW-on-K8s
JMeybohm updated the task description for T277741: Update Kubernetes cluster eqiad to kubernetes 1.16.
Mar 23 2021, 9:39 AM · Patch-For-Review, Kubernetes, Prod-Kubernetes, serviceops
JMeybohm added a project to T278208: Allow namespaces to be overriden in deployment-chart's admin_ng: Kubernetes.
Mar 23 2021, 7:54 AM · Kubernetes, serviceops, Machine-Learning-Team, Lift-Wing

Mar 22 2021

JMeybohm added a comment to T277711: Memcached, mcrouter, nutcracker's future in MediaWiki on Kubernetes.

That is already done in the MediaWiki chart.

Mar 22 2021, 10:19 AM · serviceops, SRE
JMeybohm added a subtask for T265327: Create a basic helm chart to test MediaWiki on kubernetes: T277711: Memcached, mcrouter, nutcracker's future in MediaWiki on Kubernetes.
Mar 22 2021, 9:49 AM · Patch-For-Review, SRE, serviceops, MW-on-K8s
JMeybohm added a parent task for T277711: Memcached, mcrouter, nutcracker's future in MediaWiki on Kubernetes: T265327: Create a basic helm chart to test MediaWiki on kubernetes.
Mar 22 2021, 9:49 AM · serviceops, SRE
JMeybohm updated the task description for T277741: Update Kubernetes cluster eqiad to kubernetes 1.16.
Mar 22 2021, 9:19 AM · Patch-For-Review, Kubernetes, Prod-Kubernetes, serviceops
JMeybohm updated the task description for T277741: Update Kubernetes cluster eqiad to kubernetes 1.16.
Mar 22 2021, 9:09 AM · Patch-For-Review, Kubernetes, Prod-Kubernetes, serviceops
JMeybohm updated the task description for T277741: Update Kubernetes cluster eqiad to kubernetes 1.16.
Mar 22 2021, 8:32 AM · Patch-For-Review, Kubernetes, Prod-Kubernetes, serviceops
JMeybohm updated the task description for T277741: Update Kubernetes cluster eqiad to kubernetes 1.16.
Mar 22 2021, 8:29 AM · Patch-For-Review, Kubernetes, Prod-Kubernetes, serviceops

Mar 19 2021

JMeybohm triaged T277877: Set resource requests and limits for calico PODs as High priority.
Mar 19 2021, 3:41 PM · Prod-Kubernetes, serviceops, Kubernetes, SRE
JMeybohm created T277877: Set resource requests and limits for calico PODs.
Mar 19 2021, 3:41 PM · Prod-Kubernetes, serviceops, Kubernetes, SRE
JMeybohm triaged T277876: Reserve resources for system daemons on kubernetes nodes as Medium priority.
Mar 19 2021, 3:37 PM · Patch-For-Review, serviceops, Kubernetes, Prod-Kubernetes