I'm Arturo Borrero Gonzalez from Spain (Seville). I'm Site Reliability Engineer (SRE) in the Wikimedia Cloud Services Team, a Wikimedia Foundation staff.
You may find me in some FLOSS projects, like Netfilter and Debian.
I'm Arturo Borrero Gonzalez from Spain (Seville). I'm Site Reliability Engineer (SRE) in the Wikimedia Cloud Services Team, a Wikimedia Foundation staff.
You may find me in some FLOSS projects, like Netfilter and Debian.
Just noticed:
completed.
the control plane is now upgraded:
saving this info here in case is required later:
----- OUTPUT of 'sudo -i kubeadm ...ade plan 1.24.17' ----- [upgrade/config] Making sure the configuration is correct: [upgrade/config] Reading configuration from the cluster... [upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml' W0318 10:53:57.579543 1176002 initconfiguration.go:120] Usage of CRI endpoints without URL scheme is deprecated and can cause kubelet errors in the future. Automatically prepending scheme "unix" to the "criSocket" with value "/run/containerd/containerd.sock". Please update your configuration! [preflight] Running pre-flight checks. [upgrade] Running cluster health checks [upgrade] Fetching available versions to upgrade to [upgrade/versions] Cluster version: v1.23.17 [upgrade/versions] kubeadm version: v1.24.17 [upgrade/versions] Target version: 1.24.17 [upgrade/versions] Latest version in the v1.23 series: 1.24.17
I was bitten by this recently. I think the proposal made to show at least _something_ in the logs within some sensible limits would makes sense, and definitely help both during cookbook development and later operations.
ping @cmooney
The openapi JSON can now be fetched from the API:
aborrero@cloudcontrol1005:~ $ sudo radosgw-admin user info --uid qrank\$qrank { "user_id": "qrank$qrank", "display_name": "qrank", "email": "", "suspended": 0, "max_buckets": 1000, "subusers": [], "keys": [], "swift_keys": [], "caps": [], "op_mask": "read, write, delete", "default_placement": "", "default_storage_class": "", "placement_tags": [], "bucket_quota": { "enabled": false, "check_on_raw": false, "max_size": -1, "max_size_kb": 0, "max_objects": -1 }, "user_quota": { "enabled": true, "check_on_raw": false, "max_size": 8589934592, "max_size_kb": 8388608, "max_objects": 4096 }, "temp_url_keys": [], "type": "keystone", "mfa_ids": [] }
this is scheduled for next monday 2024-03-18.
All nodes now running 1.24:
the first control node upgrade went just fine, doing the next.
the upgrade cookbook failed because the problem this patch fixed https://gerrit.wikimedia.org/r/c/cloud/wmcs-cookbooks/+/1010914 and left eh upgrade mid-flight.
We did a bit of research today about the size/scope of the problem:
This has been deployed in toolsbeta, should be ready for tools.
In T338153#9619793, @dcaro wrote:It's interesting that they are created as root though, we might be missing some restrictions in lima-kilo, in toolforge per se it has the user's user.
I would be happy to talk about this re-architecture idea. I can share a bit more info about what I tested in the past, and what architecture I had in mind when I first created this, although the code is maybe self-explanatory already.
Beware, label values and similar have limitations on what characters they can store.
What if we backup the data to a cinder volume. I know etcd servers run in local storage hypervisors, but I'm not sure if that means they cannot use cinder?
I plan to start the upgrade on toolsbeta next monday 2024-03-11.
thanks
I think kubernetes should be logging somewhere all cronjob failures.
the latest iteration on https://gerrit.wikimedia.org/r/c/operations/puppet/+/1007007 brings increased granularity to the firewalling policy.
follow up:
just sent an updated patch with a new approach for the firewall, let me know if that would work or you would rather see it more simplified.
What we did not try in 2020 was to run bird directly on the box via puppet, without openstack, to learn about upstream routes (to cloudgw).
I haven't checked if the server has the latest firmware updates issued by Dell.
This is done: