Page MenuHomePhabricator

JMeybohm
User

Projects (6)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Saturday

  • Clear sailing ahead.

User Details

User Since
Apr 2 2020, 9:01 AM (34 w, 6 h)
Availability
Available
IRC Nick
jayme
LDAP User
Unknown
MediaWiki User
JMeybohm (WMF) [ Global Accounts ]

Recent Activity

Yesterday

JMeybohm committed rODHI4d17da4f2f41: Update upstream source from tag 'upstream/3.1.3' (authored by JMeybohm).
Update upstream source from tag 'upstream/3.1.3'
Wed, Nov 25, 2:24 PM
JMeybohm committed rODHIae9c3a57096d: New upstream version 3.1.3 (authored by JMeybohm).
New upstream version 3.1.3
Wed, Nov 25, 2:24 PM
JMeybohm triaged T268743: Migrate Chartmuseum (python3-docker-report) to use helm3 as Medium priority.
Wed, Nov 25, 12:58 PM · Kubernetes, serviceops, Operations
JMeybohm created T268743: Migrate Chartmuseum (python3-docker-report) to use helm3.
Wed, Nov 25, 12:58 PM · Kubernetes, serviceops, Operations

Tue, Nov 24

JMeybohm renamed T266893: Build calico 3.17.0 from Build calico 3.16 to Build calico 3.17.0.
Tue, Nov 24, 3:31 PM · Patch-For-Review, Kubernetes, Prod-Kubernetes, serviceops
JMeybohm closed T266893: Build calico 3.17.0 as Resolved.
  • imported calico 3.17.0 packages into component/calico-future for stretch-wikimedia
    • calico-cni
    • calicoctl
    • calico-images
  • pushed docker images:
    • docker-registry.discovery.wmnet/calico/kube-controllers:v3.17.0
    • docker-registry.discovery.wmnet/calico/node:v3.17.0
    • docker-registry.discovery.wmnet/calico/typha:v3.17.0
Tue, Nov 24, 3:30 PM · Patch-For-Review, Kubernetes, Prod-Kubernetes, serviceops
JMeybohm closed T266893: Build calico 3.17.0, a subtask of T207804: Upgrade Calico, as Resolved.
Tue, Nov 24, 3:30 PM · Prod-Kubernetes, User-fsero, serviceops, Kubernetes, Operations
JMeybohm closed T266893: Build calico 3.17.0, a subtask of T244335: Upgrade kubernetes clusters to a security supported (LTS) version, as Resolved.
Tue, Nov 24, 3:30 PM · Kubernetes, Prod-Kubernetes, serviceops
JMeybohm claimed T251305: Migrate to helm v3.
Tue, Nov 24, 2:45 PM · Patch-For-Review, Kubernetes, serviceops, Operations
JMeybohm added a comment to T268612: Docker image on the build host seem to ignore apt priority for wikimedia packages.

Couple of reasons:

Tue, Nov 24, 1:27 PM · docker-pkg, serviceops, Operations
JMeybohm added a comment to T268612: Docker image on the build host seem to ignore apt priority for wikimedia packages.

Ouch.

Could we normalize everything to use the public image reference? That would also make local test more easy or straight forward.
Or do we gain a bit benefit by using the internal reference?

The point is consistency. We want to use the same registry when referencing images and saving them.

Sure. My question is more like: Why are did we start using both names in first place and can we stop doing so. :)

Tue, Nov 24, 12:16 PM · docker-pkg, serviceops, Operations
JMeybohm committed rODHF73be6731cdea: Update upstream source from tag 'upstream/0.135.0' (authored by JMeybohm).
Update upstream source from tag 'upstream/0.135.0'
Tue, Nov 24, 12:16 PM
JMeybohm committed rODHFb56d529dd47b: New upstream version 0.135.0 (authored by JMeybohm).
New upstream version 0.135.0
Tue, Nov 24, 12:16 PM
JMeybohm added a comment to T268612: Docker image on the build host seem to ignore apt priority for wikimedia packages.

Could we normalize everything to use the public image reference? That would also make local test more easy or straight forward.
Or do we gain a bit benefit by using the internal reference?

Tue, Nov 24, 11:48 AM · docker-pkg, serviceops, Operations

Mon, Nov 23

JMeybohm added a comment to T267653: Refactor calico deploy strategy.

I did a bit of testing and it looks as if it is totally possible to switch helmfile.d/admin to use helm3 and get rid of tiller there (e.g. catch-22) while keeping helm2 + tiller for helmfile.d/services for now (see T268434).

Mon, Nov 23, 2:01 PM · Patch-For-Review, Prod-Kubernetes, serviceops, Kubernetes, Operations
JMeybohm triaged T268434: Refactor our helmfile.d dir structure for admin as Medium priority.
Mon, Nov 23, 7:47 AM · Patch-For-Review, serviceops, Kubernetes, Prod-Kubernetes
JMeybohm created T268434: Refactor our helmfile.d dir structure for admin.
Mon, Nov 23, 7:47 AM · Patch-For-Review, serviceops, Kubernetes, Prod-Kubernetes

Thu, Nov 19

JMeybohm added a subtask for T244335: Upgrade kubernetes clusters to a security supported (LTS) version: T228967: Set up PodSecurityPolicies in clusters.
Thu, Nov 19, 11:04 AM · Kubernetes, Prod-Kubernetes, serviceops
JMeybohm added a parent task for T228967: Set up PodSecurityPolicies in clusters: T244335: Upgrade kubernetes clusters to a security supported (LTS) version.
Thu, Nov 19, 11:04 AM · Patch-For-Review, User-fsero, serviceops, Prod-Kubernetes

Wed, Nov 18

JMeybohm added a comment to T267653: Refactor calico deploy strategy.

@akosiaris and me discussed this further and we initially decided to give the kubernetes addon-manager a try for rolling out calico components to a freshly bootstrapped cluster. I created a binary package from kubernetes source package that sets up the addon-manager and added corresponding puppet code.

Wed, Nov 18, 5:20 PM · Patch-For-Review, Prod-Kubernetes, serviceops, Kubernetes, Operations
JMeybohm added a comment to T266893: Build calico 3.17.0.

After discussing with @akosiaris we decided to keep building the calico-images package but only use it as kind of artifact and a way to get the images out of the pbuilder environment. After building, the package can be extracted and the images imported to the registry via a script added to the package.
While not ideal, it's an okay solution for now and I've documented it at: https://wikitech.wikimedia.org/wiki/Calico

Wed, Nov 18, 3:40 PM · Patch-For-Review, Kubernetes, Prod-Kubernetes, serviceops

Mon, Nov 16

JMeybohm updated the task description for T251305: Migrate to helm v3.
Mon, Nov 16, 12:06 PM · Patch-For-Review, Kubernetes, serviceops, Operations

Tue, Nov 10

JMeybohm added a comment to T267653: Refactor calico deploy strategy.

To solve the catch-22 we could deploy the to-be calico helm chart via helm3. Which would require us to invest into helm3 integration earlier than we hoped for (at least for the helmfile.d/admin part).

Tue, Nov 10, 4:22 PM · Patch-For-Review, Prod-Kubernetes, serviceops, Kubernetes, Operations
JMeybohm triaged T267653: Refactor calico deploy strategy as High priority.
Tue, Nov 10, 4:20 PM · Patch-For-Review, Prod-Kubernetes, serviceops, Kubernetes, Operations
JMeybohm created T267653: Refactor calico deploy strategy.
Tue, Nov 10, 4:20 PM · Patch-For-Review, Prod-Kubernetes, serviceops, Kubernetes, Operations
JMeybohm added a comment to T266893: Build calico 3.17.0.

When deployed as cluster addon, we can bypass all this and have mandatory components of our stack deployed/reconceiled directly when the cluster is set up (shipping manifests via debian packages or generate via puppet).
While that sounds appealing to me, I'm not sure this is still the "correct" way. The upstream addons are called legacy for a couple of years now (https://github.com/kubernetes/kubernetes/commit/43276035734ba8a5914977b13da86ec2548fa745) ....

Tue, Nov 10, 3:28 PM · Patch-For-Review, Kubernetes, Prod-Kubernetes, serviceops

Mon, Nov 9

JMeybohm triaged T267539: Archive/Remove deprecated calico gerrit repositories as Low priority.
Mon, Nov 9, 11:21 AM · Prod-Kubernetes, serviceops, Kubernetes, Operations
JMeybohm created T267539: Archive/Remove deprecated calico gerrit repositories.
Mon, Nov 9, 11:21 AM · Prod-Kubernetes, serviceops, Kubernetes, Operations
JMeybohm claimed T266893: Build calico 3.17.0.

I've crafted a debian package similar to how the k8s packages are not build (packages calico-cni and calicoctl as we currently have).

Mon, Nov 9, 11:07 AM · Patch-For-Review, Kubernetes, Prod-Kubernetes, serviceops

Thu, Nov 5

JMeybohm closed T266766: Build new kubernetes packages as Resolved.

1.16.15 released to stretch-wikimedia component/kubernetes-future

Thu, Nov 5, 6:51 PM · Patch-For-Review, Kubernetes, Prod-Kubernetes, serviceops
JMeybohm closed T266766: Build new kubernetes packages, a subtask of T244335: Upgrade kubernetes clusters to a security supported (LTS) version, as Resolved.
Thu, Nov 5, 6:51 PM · Kubernetes, Prod-Kubernetes, serviceops
JMeybohm added a comment to T266893: Build calico 3.17.0.

As for the manifests, if we need them, they should be in hellfile.d/admin I guess?

Thu, Nov 5, 9:12 AM · Patch-For-Review, Kubernetes, Prod-Kubernetes, serviceops

Wed, Nov 4

JMeybohm added a comment to T266893: Build calico 3.17.0.

[WIP] will continue shortly

Wed, Nov 4, 11:31 AM · Patch-For-Review, Kubernetes, Prod-Kubernetes, serviceops
JMeybohm added a comment to T266895: Decide if we want to stick with etcd datastore.

Decide decide if we are going to be staying with direct access to etcd (version 3?) or try and switch to the kubernetes APIs

The big issues we currently have with the etcd backed datastore is that the information in there is not tracked anyway. It is backed up but fully unsearchable. So we definitely want to at least test using the API.

Wed, Nov 4, 10:47 AM · Prod-Kubernetes, serviceops, Kubernetes
JMeybohm added a comment to T244335: Upgrade kubernetes clusters to a security supported (LTS) version.

We are not able to go 1.19 because of calico only supporting 1.18

Looks like this isn't true. Judging from https://github.com/projectcalico/calico/commit/21a45a4a141fff03b251fde2f1ab77fbb0c903ee#diff-f386c272afd3d855bf9f1d3609d1782962951258a58e2b298df60c70b16517ee, the calico 3.16 requirements page will be updated soon.

Wed, Nov 4, 10:47 AM · Kubernetes, Prod-Kubernetes, serviceops
JMeybohm updated the task description for T244335: Upgrade kubernetes clusters to a security supported (LTS) version.
Wed, Nov 4, 9:23 AM · Kubernetes, Prod-Kubernetes, serviceops
JMeybohm updated the task description for T244335: Upgrade kubernetes clusters to a security supported (LTS) version.
Wed, Nov 4, 9:06 AM · Kubernetes, Prod-Kubernetes, serviceops

Tue, Nov 3

JMeybohm renamed T266766: Build new kubernetes packages from Build kubernetes 1.19 to Build new kubernetes packages.
Tue, Nov 3, 3:23 PM · Patch-For-Review, Kubernetes, Prod-Kubernetes, serviceops
JMeybohm updated the task description for T244335: Upgrade kubernetes clusters to a security supported (LTS) version.
Tue, Nov 3, 3:19 PM · Kubernetes, Prod-Kubernetes, serviceops
JMeybohm added a subtask for T244335: Upgrade kubernetes clusters to a security supported (LTS) version: T251305: Migrate to helm v3.
Tue, Nov 3, 3:15 PM · Kubernetes, Prod-Kubernetes, serviceops
JMeybohm added a parent task for T251305: Migrate to helm v3: T244335: Upgrade kubernetes clusters to a security supported (LTS) version.
Tue, Nov 3, 3:15 PM · Patch-For-Review, Kubernetes, serviceops, Operations

Mon, Nov 2

JMeybohm added a comment to T266893: Build calico 3.17.0.

I typically prefer if we rebuild images from dockerfiles, using our base images. That gives us a tad more control over upgrading in case of a disaster security hole in e.g. alpine linux.

Mon, Nov 2, 7:42 AM · Patch-For-Review, Kubernetes, Prod-Kubernetes, serviceops
JMeybohm added a comment to T266766: Build new kubernetes packages.

An additional alternative could be to switch to a build without docker like the debian upspream does. That would maybe require us to backport golang-1.15 but would remove the build-dependency to docker with all it's space and network requirements.

Mon, Nov 2, 7:26 AM · Patch-For-Review, Kubernetes, Prod-Kubernetes, serviceops

Fri, Oct 30

JMeybohm created T266895: Decide if we want to stick with etcd datastore.
Fri, Oct 30, 5:39 PM · Prod-Kubernetes, serviceops, Kubernetes
JMeybohm added a subtask for T207804: Upgrade Calico: T266893: Build calico 3.17.0.
Fri, Oct 30, 5:33 PM · Prod-Kubernetes, User-fsero, serviceops, Kubernetes, Operations
JMeybohm added a parent task for T266893: Build calico 3.17.0: T207804: Upgrade Calico.
Fri, Oct 30, 5:33 PM · Patch-For-Review, Kubernetes, Prod-Kubernetes, serviceops
JMeybohm created T266893: Build calico 3.17.0.
Fri, Oct 30, 5:33 PM · Patch-For-Review, Kubernetes, Prod-Kubernetes, serviceops
JMeybohm added a comment to T266766: Build new kubernetes packages.

Unfortunately, the current version of calico is not tested against kubernetes 1.19.x yet (https://docs.projectcalico.org/getting-started/kubernetes/requirements#kubernetes-requirements).
While it might still work, we should stick to a tested/supported version (at least for production) and there is no release timeline I found for k8s 1.19 support in calico. So no idea if we get a calico version supporting 1.19 anytime "soon".

Fri, Oct 30, 5:23 PM · Patch-For-Review, Kubernetes, Prod-Kubernetes, serviceops
JMeybohm added a comment to T266766: Build new kubernetes packages.

Unfortunately there are no signed releases of k8s as of now (https://github.com/kubernetes/release/issues/914) so the best we could get is the binary tar balls plus sha512 from the official GCS bucket and package those.

So relying on e.g. https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.18.md#downloads-for-v11810 (from what I see the server binaries tar is enough as it contains whatever the node and client tars have) and some thin debian/rules to fetch/extract and put in the right place before building the deb so that we don't have to change our puppet code as well? That could work.

Fri, Oct 30, 3:07 PM · Patch-For-Review, Kubernetes, Prod-Kubernetes, serviceops
JMeybohm updated subscribers of T266766: Build new kubernetes packages.

With Kubernetes 1.19.3 things changed a bit and we now need a docker version supporting the --platform flag for FROM.
The envoy builder host would support that but lacks enough space on /var/lib/docker to hold all the intermediate images used for build.

Fri, Oct 30, 1:21 PM · Patch-For-Review, Kubernetes, Prod-Kubernetes, serviceops
JMeybohm closed T266032: Test deployment-charts for kubernetes 1.19 compatibility as Resolved.

We're fine here.

Fri, Oct 30, 12:15 PM · Patch-For-Review, Kubernetes, Prod-Kubernetes, serviceops
JMeybohm closed T266032: Test deployment-charts for kubernetes 1.19 compatibility, a subtask of T244335: Upgrade kubernetes clusters to a security supported (LTS) version, as Resolved.
Fri, Oct 30, 12:15 PM · Kubernetes, Prod-Kubernetes, serviceops

Thu, Oct 29

JMeybohm updated the task description for T244335: Upgrade kubernetes clusters to a security supported (LTS) version.
Thu, Oct 29, 12:51 PM · Kubernetes, Prod-Kubernetes, serviceops
JMeybohm triaged T266766: Build new kubernetes packages as High priority.
Thu, Oct 29, 10:45 AM · Patch-For-Review, Kubernetes, Prod-Kubernetes, serviceops
JMeybohm updated the task description for T244335: Upgrade kubernetes clusters to a security supported (LTS) version.
Thu, Oct 29, 10:40 AM · Kubernetes, Prod-Kubernetes, serviceops
JMeybohm created T266766: Build new kubernetes packages.
Thu, Oct 29, 10:38 AM · Patch-For-Review, Kubernetes, Prod-Kubernetes, serviceops

Wed, Oct 28

JMeybohm closed T262675: Store Kubernetes events for more than one hour as Resolved.

Eventrouter is deployed to all clusters now.

Wed, Oct 28, 2:07 PM · Patch-For-Review, observability, Prod-Kubernetes, Kubernetes, serviceops
JMeybohm triaged T266670: Add helmfile validation for the helmfile.d/admin part as Medium priority.
Wed, Oct 28, 1:53 PM · serviceops, Prod-Kubernetes, Kubernetes

Tue, Oct 27

JMeybohm added a comment to T266032: Test deployment-charts for kubernetes 1.19 compatibility.
[27.10.20 14:03] <jayme> akosiaris: so unfortunately, kubeval is way less picky than kubeyaml is. I guess that's simply because it just validates against the spec rather then actually parsing into the go structures (which I think kubeyaml did)
[27.10.20 14:04] <jayme> means we won't get those "type errors" we had seen (like "[spec.template.spec.volumes] key volumes has wrong type <nil> (should be []interface{})")
[27.10.20 14:07] <akosiaris> umf
[27.10.20 14:09] <jayme> the nodejs kubeyaml otoh still seems to detect those
[27.10.20 14:10] <jayme> and still has this weird issue of just parsing the first object of a yaml stream :-)
[27.10.20 14:17] <jayme> but that difference also means that kubeyaml still uses go for the backend, which turns out to be true...
[27.10.20 14:17] <jayme> sooo...we could a) run the kubeyaml backend internally or b) hack together a cli
[27.10.20 14:31] <akosiaris> jayme: there is a c). Just figure out what needs to be done to import the versions we are missing to the current one
[27.10.20 14:31] <akosiaris> it means we are forking it of course.
[27.10.20 14:31] <akosiaris> Which given upstream's stance, it might not be a bad idea. I am just not sure we want to pay that cost. Lemme try to do it and gauge how much of a pain it is
[27.10.20 14:32] <jayme> akosiaris: oh, yeah. That ofc...no problem - I can figure out what it needs to add 1.19, I'm looking into all that now anyways
[27.10.20 14:33] <jayme> on the long run, maybe someone will re-add CLI to the current backend code ...
[27.10.20 14:46] <jayme> akosiaris: the backend code seems to be API compatible to the old CLI interface :D
Tue, Oct 27, 5:05 PM · Patch-For-Review, Kubernetes, Prod-Kubernetes, serviceops

Oct 27 2020

JMeybohm added a comment to T262675: Store Kubernetes events for more than one hour.

Two more eventrouter patches in I must say I'm a bit disappointed by my decision to go with that one but I *think* it should be good now. Will revisit over the week to see if duplicate events are gone, re-check on resources etc.

Oct 27 2020, 7:27 AM · Patch-For-Review, observability, Prod-Kubernetes, Kubernetes, serviceops
JMeybohm committed rOSHR9a9ad2db2ad7: Lower label cardinality of prometheus metrics (authored by JMeybohm).
Lower label cardinality of prometheus metrics
Oct 27 2020, 6:48 AM
JMeybohm committed rOSHR37dce3539b47: Don't send duplicate events from resync to sink (authored by JMeybohm).
Don't send duplicate events from resync to sink
Oct 27 2020, 6:48 AM

Oct 26 2020

JMeybohm added a comment to T262675: Store Kubernetes events for more than one hour.

I've changed the field names to be more specific so events are indexed now.

Oct 26 2020, 7:14 PM · Patch-For-Review, observability, Prod-Kubernetes, Kubernetes, serviceops
JMeybohm closed T266194: wikifeeds-production-tls-proxy regularly exceeding its k8s CPU reservation as Resolved.

Looks way better now, even under higher load.

Oct 26 2020, 4:00 PM · Kubernetes, Wikifeeds, serviceops
JMeybohm claimed T266032: Test deployment-charts for kubernetes 1.19 compatibility.

My current plan is to build a kubeval deb and add a git repo with the needed kubernetes api schema. I think we don't need fancy auto-upgrade stuff for the schema repo as we don't upgrade k8s that often. I will add a script like the above, so that we can just run that locally and commit the new kubernetes version schema to the repo once we plan to upgrade to a new version (or once we want to test for compatibility with a new version).

Oct 26 2020, 11:10 AM · Patch-For-Review, Kubernetes, Prod-Kubernetes, serviceops

Oct 23 2020

JMeybohm added a comment to T262675: Store Kubernetes events for more than one hour.

Unfortunately it looks as if the logging pipeline does not parse the output of eventrouter by default:
https://logstash-next.wikimedia.org/goto/d8b98b06cbe6f8089e48c090f479bfc9

Oct 23 2020, 2:52 PM · Patch-For-Review, observability, Prod-Kubernetes, Kubernetes, serviceops
JMeybohm added a comment to T262675: Store Kubernetes events for more than one hour.

Unfortunately it looks as if the logging pipeline does not parse the output of eventrouter by default:
https://logstash-next.wikimedia.org/goto/d8b98b06cbe6f8089e48c090f479bfc9

Oct 23 2020, 2:07 PM · Patch-For-Review, observability, Prod-Kubernetes, Kubernetes, serviceops

Oct 22 2020

JMeybohm added a comment to T266194: wikifeeds-production-tls-proxy regularly exceeding its k8s CPU reservation.

That's pretty interesting, there shouldn't be so much throttling at so low CPU usage. user+system summed barely hit 1/5 of the limit.

+1 to bumping the limit to see if it would solve latency issues, but it might be indeed related to T262527

Oct 22 2020, 10:11 AM · Kubernetes, Wikifeeds, serviceops
JMeybohm moved T266194: wikifeeds-production-tls-proxy regularly exceeding its k8s CPU reservation from Incoming 🐫 to Doing 😎 on the serviceops board.
Oct 22 2020, 8:56 AM · Kubernetes, Wikifeeds, serviceops
JMeybohm claimed T266194: wikifeeds-production-tls-proxy regularly exceeding its k8s CPU reservation.

I think you are right, thanks for the heads up!

Oct 22 2020, 8:46 AM · Kubernetes, Wikifeeds, serviceops
JMeybohm merged T255273: Upgrade kubernetes nodes to kernel 4.19.x into T262527: Update to kernel 4.19 on kubernetes nodes.
Oct 22 2020, 8:45 AM · User-jijiki, serviceops, Prod-Kubernetes, Kubernetes
JMeybohm merged task T255273: Upgrade kubernetes nodes to kernel 4.19.x into T262527: Update to kernel 4.19 on kubernetes nodes.
Oct 22 2020, 8:45 AM · serviceops
JMeybohm added a parent task for T256256: Raise an alarm on container restarts/OOMs in kubernetes: T266216: Increase visibility of container/pod ressource exhaustion .
Oct 22 2020, 8:42 AM · Sustainability (Incident Followup), serviceops, Kubernetes, ChangeProp
JMeybohm added subtasks for T266216: Increase visibility of container/pod ressource exhaustion : T264625: Deploy kube-state-metrics, T256256: Raise an alarm on container restarts/OOMs in kubernetes.
Oct 22 2020, 8:42 AM · observability, serviceops, Prod-Kubernetes, Kubernetes
JMeybohm added a parent task for T264625: Deploy kube-state-metrics: T266216: Increase visibility of container/pod ressource exhaustion .
Oct 22 2020, 8:42 AM · serviceops, User-jijiki, Kubernetes
JMeybohm triaged T266216: Increase visibility of container/pod ressource exhaustion as Medium priority.
Oct 22 2020, 8:42 AM · observability, serviceops, Prod-Kubernetes, Kubernetes

Oct 21 2020

JMeybohm added a comment to T266032: Test deployment-charts for kubernetes 1.19 compatibility.

I fiddled with this a bit and I it is possible to use a local version/checkout of the schema which we can also generate ourselves with something like:

Oct 21 2020, 12:24 PM · Patch-For-Review, Kubernetes, Prod-Kubernetes, serviceops
JMeybohm added a comment to T265324: Create the base container images for running MediaWiki in a production environment.

Oh, well. Sorry then. I guess I just misread the "one image configured to manage a php-fpm application." part as already containing fpm.

Oct 21 2020, 8:56 AM · Operations, serviceops, MW-on-K8s
JMeybohm added a comment to T266032: Test deployment-charts for kubernetes 1.19 compatibility.

While it looks like kubeval basically works and we can easily replace kubeyaml with it, it has the disadvantage of relying on https://kubernetesjsonschema.dev (https://github.com/instrumenta/kubernetes-json-schema) to fetch the schema from so we would need to include a copy of that in the CI image and run like:
kubeval -v 1.11.0 -s file:///tmp/kubernetes-json-schema/

Oct 21 2020, 8:54 AM · Patch-For-Review, Kubernetes, Prod-Kubernetes, serviceops
JMeybohm added a comment to T265324: Create the base container images for running MediaWiki in a production environment.

If I got this right you are purposing to put apache and php-fpm in the same container, correct (talking about vhosts in fpm context)?
I can think of reasons why that might make sense to do (sharing "static" assets for example, ease of MVP), but maybe you could outline them here?

Oct 21 2020, 8:30 AM · Operations, serviceops, MW-on-K8s

Oct 20 2020

JMeybohm updated the task description for T244335: Upgrade kubernetes clusters to a security supported (LTS) version.
Oct 20 2020, 3:55 PM · Kubernetes, Prod-Kubernetes, serviceops
JMeybohm updated the task description for T244335: Upgrade kubernetes clusters to a security supported (LTS) version.
Oct 20 2020, 3:54 PM · Kubernetes, Prod-Kubernetes, serviceops
JMeybohm merged T241076: Define the plan for the upgrade of kubernetes cluster to a security supported release into T244335: Upgrade kubernetes clusters to a security supported (LTS) version.
Oct 20 2020, 3:48 PM · Kubernetes, Prod-Kubernetes, serviceops
JMeybohm merged task T241076: Define the plan for the upgrade of kubernetes cluster to a security supported release into T244335: Upgrade kubernetes clusters to a security supported (LTS) version.
Oct 20 2020, 3:48 PM · Prod-Kubernetes, Kubernetes, serviceops
JMeybohm triaged T266032: Test deployment-charts for kubernetes 1.19 compatibility as High priority.
Oct 20 2020, 3:44 PM · Patch-For-Review, Kubernetes, Prod-Kubernetes, serviceops
JMeybohm created T266032: Test deployment-charts for kubernetes 1.19 compatibility.
Oct 20 2020, 3:43 PM · Patch-For-Review, Kubernetes, Prod-Kubernetes, serviceops
JMeybohm renamed T207804: Upgrade Calico from Upgrade calico in production to version 2.4+ to Upgrade Calico.
Oct 20 2020, 3:15 PM · Prod-Kubernetes, User-fsero, serviceops, Kubernetes, Operations
JMeybohm renamed T244335: Upgrade kubernetes clusters to a security supported (LTS) version from Upgrade production kubernetes clusters to a security supported version to Upgrade kubernetes clusters to a security supported (LTS) version.
Oct 20 2020, 3:11 PM · Kubernetes, Prod-Kubernetes, serviceops
JMeybohm added a parent task for T207804: Upgrade Calico: T244335: Upgrade kubernetes clusters to a security supported (LTS) version.
Oct 20 2020, 3:01 PM · Prod-Kubernetes, User-fsero, serviceops, Kubernetes, Operations
JMeybohm added a parent task for T241076: Define the plan for the upgrade of kubernetes cluster to a security supported release: T244335: Upgrade kubernetes clusters to a security supported (LTS) version.
Oct 20 2020, 3:01 PM · Prod-Kubernetes, Kubernetes, serviceops
JMeybohm added subtasks for T244335: Upgrade kubernetes clusters to a security supported (LTS) version: T241076: Define the plan for the upgrade of kubernetes cluster to a security supported release, Unknown Object (Task), T207804: Upgrade Calico, Unknown Object (Task).
Oct 20 2020, 3:01 PM · Kubernetes, Prod-Kubernetes, serviceops
JMeybohm closed T252428: Make helm upgrades atomic as Resolved.
Oct 20 2020, 8:42 AM · Patch-For-Review, Prod-Kubernetes, serviceops, Kubernetes
JMeybohm added a project to T265979: Alert on unapplied changes in deployment-charts repo: serviceops.
Oct 20 2020, 8:41 AM · serviceops, Prod-Kubernetes, Kubernetes
JMeybohm triaged T265979: Alert on unapplied changes in deployment-charts repo as Medium priority.
Oct 20 2020, 8:40 AM · serviceops, Prod-Kubernetes, Kubernetes
JMeybohm committed rOSHRaf855cb642aa: Add vendor (authored by JMeybohm).
Add vendor
Oct 20 2020, 8:13 AM
JMeybohm committed rOSHR627fa0799b95: Fix json syntax (authored by JMeybohm).
Fix json syntax
Oct 20 2020, 8:13 AM
JMeybohm committed rOSHRa24d9e469355: Fix json syntax (authored by JMeybohm).
Fix json syntax
Oct 20 2020, 8:00 AM
JMeybohm added a comment to T262675: Store Kubernetes events for more than one hour.

Quick chat in IRC turned out that we don't have a "good for kubernetes" way to discover the kafka brokers (like DNS SRV records) producing directly to kafka-logging would require some coupling with puppet code/re-deployments on changes to kafka-logging brokers (which is obviously bad).

Oct 20 2020, 7:32 AM · Patch-For-Review, observability, Prod-Kubernetes, Kubernetes, serviceops
JMeybohm added a comment to T258572: Refactor our helmfile.d dir structure for services.

Ok! Done.

Oct 20 2020, 7:04 AM · Patch-For-Review, Prod-Kubernetes, Release Pipeline, serviceops

Oct 19 2020

JMeybohm moved T252428: Make helm upgrades atomic from Incoming 🐫 to Doing 😎 on the serviceops board.
Oct 19 2020, 12:53 PM · Patch-For-Review, Prod-Kubernetes, serviceops, Kubernetes