From the convo here: https://github.com/kubernetes/kubernetes/issues/47695, the issue is that it is recognizing it as a user not as a node, which is strange.
Also have it here: Attempting to register node toolsbeta-arturo-k8s-worker-1.toolsbeta.eqiad.wmflabs
Jun 26 20:27:38 toolsbeta-arturo-k8s-worker-1 kubelet: E0626 20:27:38.134196 5431 kubelet_node_status.go:92] Unable to register node "toolsbeta-arturo-k8s-worker-1.toolsbeta.eqiad.wmflabs" with API server: nodes is forbidden: User "toolsbeta-arturo-k8s-worker-1.toolsbeta.eqiad.wmflabs" cannot create resource "nodes" in API group "" at the cluster scope
Much more progress. It's not trying as anonymous anymore because I missed a spot in the config:
I have manually changed /etc/default/kube-apiserver with the needed sections and had to change /lib/systemd/system/kube-apiserver.service because the admission control and admission plugins options are mutally exclusive. Overall, it seems to reject things still because at least some things are coming through as system:anonymous, which is still somewhat mysterious.
The api-server needs to be started with --authorization-mode=Node apparently to do this.
I have it up on haproxy :)
It may also be the wrong namespace. The kubeconfig is using default rather than such things as "kube-system".
So I went ahead and created the node by hand running kubectl create -f node1.yaml where the file on the master has the following content:
Jun 26 18:31:31 toolsbeta-arturo-k8s-worker-1 kubelet: E0626 18:31:31.427856 860 kubelet.go:2236] node "toolsbeta-arturo-k8s-worker-1.toolsbeta.eqiad.wmflabs" not found
Updating the patch with (hopefully) the right username :)
That isn't quite correct, what I did there. It's based on prod and correctly authenticates with a cert. However, I think the way we are really trying to do this is more like https://kubernetes.io/docs/reference/access-authn-authz/node/
Jun 26 16:58:17 toolsbeta-arturo-k8s-worker-1 kubelet: E0626 16:58:17.629217 30037 kubelet_node_status.go:92] Unable to register node "toolsbeta-arturo-k8s-worker-1.toolsbeta.eqiad.wmflabs" with API server: nodes is forbidden: User "toolsbeta-arturo-k8s-worker-1.toolsbeta.eqiad.wmflabs" cannot create resource "nodes" in API group "" at the cluster scope
Set the file to look like:
If we let kubenetes do it, it would be like https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet-tls-bootstrapping/
The toolsbeta-puppetmaster-02 is certainly upgraded (is on stretch) and uses puppetdb as well.
I believe this is closeable.
@aborrero I found the problem we are having getting the first worker node up. https://kubernetes.io/docs/concepts/architecture/nodes/#self-registration-of-nodes
Tue, Jun 25
(verified on tools-k8s-master-01, since I forgot to say how I know)
Puppet is working again
I pushed out the newer package on all Jessie vms.
So the fun part is: where to put that pin in puppet...
Pin to 3.4.2-1+deb8u2 that is.
So that package is entirely broken, in other words. We probably will want to pin jessie to using the version before it.
Oh that's lovely.
Looks like the most likely place for this to fail is a urlopen on http://labs-puppetmaster.wikimedia.org:8100/v1/tools/node/tools-k8s-master-01.tools.eqiad.wmflabs (as an example) or if that comes back with invalid yaml. What's weird is that currently seems to result in valid yaml, but that isn't working nonetheless.
Notably puppet-enc will fail if the hostname isn't right. That needs to be noted when we look at updating the DNS names of our VMs.
It'll fail if the name doesn't end in wmflabs or labtests :)
Something restarted it then: [Tue Jun 25 06:31:52.624395 2019] [mpm_prefork:notice] [pid 17003] AH00171: Graceful restart requested, doing restart
Looks like the tools puppetmaster ended up with a bit of a problem this morning
Mon, Jun 24
I have a mind to experiment with a copy of maintain-kubeusers that speaks x509/RBAC instead of token/ABAC. There are other organizations that use multiple CAs, one for infra and one for users, and the certificates api makes this entirely doable from python. I'll kick that piece a little and see if it makes sense.
So figuring, based on that data, that it may not be impossible to fill the link, it's extremely unlikely that we will (and we still would love to use jumbo frames), can we put this on other rows?
Ok, that said, I did write that misreading Mbps for Gbps...but what I said is still true! The PoC won't be anywhere near all that, and our full build out is a trickle compared to theoretical limits.
I should point out that the PoC will not be capable of doing anywhere near that much IO. That would be what it would look like if we managed to convert the entire cluster to Ceph with a full build out. We would not handle the full buildout with three OSDs because of those numbers above.
+1 on using modules. Since go mod vendor is even an option, and it does some basic hash checking, it seems sensible. I agree it requires recent golang, but previous versions were packaging chaos in general. Since it's usually producing a statically-linked binary, it could be deployed from either a "scratch" docker image (in k8s), which has literally nothing else in it, or a deb package that wouldn't be terribly hard to generate compared to an interpreted executable or dynamically linked binary.
@Andrew this seems...fixed does it not?
The bug in mariadb is fixed in a future version of mariadb we don't have. I'm going to shuffle this to the graveyard for now. Overall, there isn't much interest in additional automation, yet, because the manual checks are still considered essential for now.
Had a huddle with @JHedden, actually. He'll add his thoughts soon (with a some info from our existing monitoring).
Note: there are rate limits that can be set within openstack for this as well...but in some versions, they don't work right at all (they get ignored in some cases https://bugzilla.redhat.com/show_bug.cgi?id=1476830), but this is also things we want to be testing. That won't help back-end stuff, etc either. It's just a note.
100% agree with you @faidon, and I appreciate the reply. I'm aiming to avoid any sugar-coating in my assessments of risks until I have more data (especially with 40G uplinks that are widely shared), partly to open conversations and make sure we design carefully. After a bit of time to think about this, I have some more thoughts.
Thu, Jun 20
And when that runs on puppet, I see we are green.
Tentatively crossing off the registry validation bit because the webhook is deployable. I fully expect to find ways it isn't finished when we are rolling things out in toolsbeta.
@ayounsi Ceph docs are vague at best or tend to ask you to read dissertations eventually. Overall, everything comes back to "test it in your cluster and see". Ceph is capable of saturating 10G links under heavy load (and the private link would be able to saturate during node failures for rebuilds). A 40G link would be harder to saturate, but it is theoretically possible. This is a PoC, so my intent is to break it every which way and put it under test loads. We would certainly want to keep an eye on those links during tests (are you able to point me to where I could do that?).
Wed, Jun 19
It still needs the script run against the DBs before you'll see them in cloud services.
Oh! Ok. I see it there on those dbs. So disregard what I said. We can get that out shortly.
I mean I just checked and don't see the underlying tables on labsdb1009 (rather specifically). Are they in all wiki dbs or just a couple that I should verify?
It works ok from the Foundation's public network servers. Just for notes.
So these tables are not yet on the replicas, however the scripts we use to expose views will just skip the tables that aren't there until they are. Validating the patch and all locally, and then I'll merge it. Once merged, it won't do anything until the tables are on the replicas and then WMCS runs the script manually.
Ahah! Thank you. It makes sense now. We'll get on it!
I have zero context regarding what these tables are from reading back in the tickets. @Bawolff (or someone else? I've been asking you on a lot of these tickets), can I get confirmation that these tables are entirely public without filters?
Ok, all it did was enable the service, which was the idea. On next reboot it will hopefully not go poorly :)
Tue, Jun 18
Can we get anywhere by using the comment_revision view instead of comment? https://wikitech.wikimedia.org/wiki/News/Actor_storage_changes_on_the_Wiki_Replicas#The_actor_table_seems_really_slow--so_does_comment
Just to double-check: IIRC, back in the day we avoided this because we had multiple controllers attached to a shared shelf and if two controllers ran at the same time then terrible, terrible things happened. Is it safe to say that there's no current situation where having 'too many' nfs services running at once causes harm?
I thought I'd set them on the role rather than the profile? Checking that it appears that they aren't set in the right place. I'd rather it be on the role. Lemme check if that will work right. Thanks for looking!
That I believe is done. We can always do something that will make it page to be sure....
But the hiera works on the other NFS servers. It unfortunately tested itself.
Mon, Jun 17
So this needs to connect to 111 over UDP from the public network, which is currently is not allowed to do.
Keeping an eye on trends a bit more
The change doesn't seem to have hurt or helped since I made it. Stopping the client-side monitoring has done far more.
Puppet now runs on the registry node, but it doesn't work because it needs the SSL cert placed in the private repo (like in tools) and some odd prometheus error. However, puppet now functions, so this ticket is done. Whoever was working on making a beta registry can continue now.
Fri, Jun 14
All that said, instead of CephFS, RBD can be exported as iSCSI, which, if set up with appropriate multipathing and a clusterFS (or using GaneshaNFS so that userland locking takes place), we could build an actually HA Linux NFS server with Ceph backing it in the exact same way as it would be for cloudvirts, with all the same quirks. That would mean that NFS could still do immutable bits because, well, it can do that. While there may be useful cases where CephFS makes sense, we may get much more use out of Ceph by simply using it for block devices in nearly every case. That sort of consistent use might make managing a ceph cluster "easier" as well, even if it might complicate NFS a bit (but not really much more than it already is). The topic of shipping ceph rbds to iscsi targets is not a small one, so I'm putting that away for now (but it may be worth testing locally while kicking this around).
Note: I just confirmed locally that CephFS cannot set extended attributes in Luminous.
The feature has a tracker here: http://tracker.ceph.com/issues/10679
As this was last updated 3 years ago, I expect them to implement that as soon as one of us writes the patch basically 😛
Manually changed the /data/project/.system_sge/gridengine/default/common/act_qmaster file and got the process going. The beta grid is healthy again.
The gridmaster service isn't running because apparently the shadow is! Jun 14 16:09:07 toolsbeta-sgegrid-master sge_qmaster: critical error: qmaster on host "toolsbeta-sgegrid-shadow.toolsbeta.eqiad.wmflabs" is still running - terminating
Puppet is now able to run cleanly on the grid master.
sigh, toolsbeta-sgegrid-master cannot install jobutils because of weirdness in aptly most likely. I think we ended up making the stretch toolsbeta repo there an actual repo, so jobutils would need to actually be there for it to work E: Unable to locate package jobutils
toolsbeta-proxy-01 seems kind of half-baked. I think it was someone's work toward making toolsbeta more like tools to test things against the proxy.
Function lookup() did not find a value for the name 'profile::toolforge::toolviews::mysql_password' at /etc/puppet/modules/profile/manifests/toolforge/toolviews.pp:4 I imagine this is a distant reflection of T101651.
toolsbeta-k8s-lb-01 isn't working because I didn't finish it: profile::toolforge::k8s::api_servers is empty and needs values. I may just delete that instance.