The job fails on registry2002, leading to icinga alerts
Adding https://metallb.universe.tf/ as a potential solution as well.
Fri, Jan 22
Not completely sure (limited puppet knowledge) but maybe that kafka_config does not exist on cloudvps for some reason?!
@Ottomata I guess you're not yet using the valued generated by that code, so we could simply revert for now, right?
Unfortunately, upstream was not very responsive on my question about adding Cache-Control (https://github.com/helm/chartmuseum/issues/368). I wonder why this problem arised only recently as we ran this configuration from the start.
Thu, Jan 21
The registry now responds properly with vary: Accept
With that merged, this is fixed now. Thanks @jbond !
I don't see anything interesting in the 2.7.1 release (https://github.com/docker/distribution/releases/tag/v2.7.1, https://metadata.ftp-master.debian.org/changelogs//main/d/docker-registry/docker-registry_2.7.1+ds2-7_changelog) so I would vote for just copying the stretch package over to buster.
Mon, Jan 18
Mon, Jan 11
Fri, Jan 8
I prefer to dupe for more context. Hope that's fine with you.
Thu, Jan 7
Yeah, the image was removed (as far as thats easily possible) as of T253396
Tue, Jan 5
As I see it these hosts do have IPv6 in DNS (and netbox).
Mon, Jan 4
Dec 21 2020
Dec 18 2020
I don't see those directories being used anywhere so I suggest we just remove the tmpfile registration from master and node package.
As of https://gerrit.wikimedia.org/r/c/operations/puppet/+/648356 we're now running staging-codfw with docker 18.06.3 and it looks good so far.
We packaged and deployed calico 3.16 to staging-codfw but currently we lack full IPv6 support due to the fact that we now require kubernetes to dual-stack/IPv6 enabled as well (as we run calico with kubernetes datastore backend).
This is done with calico deployed now via puppet (CNI plugins and calicoctl) as well as helm3 (helmfile.d/admin_ng).
Everything is under version control and there are no catch-22's anymore during cluster bootstrapping.
Dec 17 2020
Maybe Mac sends a different user-agent? That would be fun...
For the record:
We where sending "Content-Type: application/vnd.docker.distribution.manifest.v2+json" as a gzip stream to the clients (hence no content-length) and docker 20.10 seems to no longer accept that but instead fail.
Also rising priority as I guess this will affect more and more users/devs when they upgrade to docker 20.10.
I was able to reproduce as well now. Seems like we are missing the "content-length" header (at least on the manifest API endpoint) if the response is *not* served from cache. P13565 can reproduce that when using "docker-registry.wikimedia.org" but not when using "docker-registry.discovery.wmnet".
Unfortunately still unable to reproduce, even with docker & docker-engine 20.10.1 (on linux)
Dec 16 2020
Could you please check if you see any additional errors/hints in the docker daemon logs? The image/layers are not actually pulled by the client so a hint might be in there. You could also enable debug there (https://docs.docker.com/config/daemon/#enable-debugging) which could lead us somewhere.
Dec 15 2020
Currently we're setting those per node in hiera. Would be nice to have that automated (T229397).
https://grafana.wikimedia.org/d/g0GUXaJMk/t249745?orgId=1 says that instead it might make sense to increase the memory limit. There's 3 spikes above the limit in the last 30 days, leading to extra CPU usage and probably increased GC cycles
Dec 14 2020
"[...] has invalid property: anyOf" seems like a bug in apiserver and can be ignored IIUC: https://github.com/kubernetes/kubernetes/issues/90902
The message is a no-op, and means server-side apply for that custom resource falls back to "no schema" mode.
Dec 11 2020
Discussion as of today: "We messed up looking at changelogs and figure out dependencies"
Dec 10 2020
[puppet-private] (487bdca0) (jayme) Add calicoctl and calico-cni kubernetes users
Can/should it maybe be integrated into https://wikitech.wikimedia.org/wiki/API_Gateway instead of going though MW?
Dec 9 2020
Secrets have been planted with "[puppet-private] (d2082d1e) (jayme) Add linkrecommendation db credentials".
Generated YAML looks fine (deploy1001:/srv/deployment-charts/helmfile.d/services/linkrecommendation$ helmfile -e staging template)
The new calico chart is merged, thanks @akosiaris
What is missing currently is a proper RoleBinding for the calicoctl user as I was not sure yet what permissions he's going to need.
We should be not using the tool for changing calico config, that's to be done via the helm chart now. But we will want to keep the analyze functionality intact. Could not find any docs on that by know so we will maybe just have to figure it out when we have a node in staging-codfw
That's fine, but since the tool is also used to run some diagnostics and will only be run from the kubernetes nodes by an SRE, it's probably ok to use the network-admin role that is defined in https://docs.projectcalico.org/getting-started/kubernetes/hardway/end-user-rbac
The new calico chart is merged, thanks @akosiaris
Dec 7 2020
Dec 4 2020
Dec 3 2020
Dec 2 2020
docker-report 0.0.9 builds helm charts with helm3 now. Rolled the fix out to chartmuseum hosts (and patched puppet to install helm3 there).
helm2 is not capable of packaging helm apiVersion v2 charts ofc.
Nov 27 2020
I'll take a look if what we use from the old incubator has made it to the new one.
@kostajh - reminder we are still waiting on knowing from where this database will be accessed.
I could grant 10.64.% or whatever, but if there's something more concrete, that'd be useful.
Nov 26 2020
Need to update/fix helm (2) to satisfy helmfile's "helm version" parser (it panics again)