Page MenuHomePhabricator

toolforge: new k8s: issues with the apiserver and etcd
Closed, ResolvedPublic

Description

Investigate the following error messages:

root@toolsbeta-test-k8s-control-1:~# kubectl logs kube-apiserver-toolsbeta-test-k8s-control-3 -n kube-system
[...]
I1114 10:09:51.479119       1 asm_amd64.s:1337] ccResolverWrapper: sending new addresses to cc: [{toolsbeta-test-k8s-etcd-1.toolsbeta.eqiad.wmflabs:2379 0  <nil>}]
I1114 10:09:51.479241       1 asm_amd64.s:1337] balancerWrapper: got update addr from Notify: [{toolsbeta-test-k8s-etcd-1.toolsbeta.eqiad.wmflabs:2379 <nil>}]
W1114 10:09:51.479286       1 asm_amd64.s:1337] Failed to dial toolsbeta-test-k8s-etcd-3.toolsbeta.eqiad.wmflabs:2379: context canceled; please retry.
W1114 10:09:51.479360       1 asm_amd64.s:1337] Failed to dial toolsbeta-test-k8s-etcd-2.toolsbeta.eqiad.wmflabs:2379: context canceled; please retry.
I1114 10:09:51.479412       1 asm_amd64.s:1337] balancerWrapper: got update addr from Notify: [{toolsbeta-test-k8s-etcd-1.toolsbeta.eqiad.wmflabs:2379 <nil>} {toolsbeta-test-k8s-etcd-2.toolsbeta.eqiad.wmflabs:2379 <nil>} {toolsbeta-test-k8s-etcd-3.toolsbeta.eqiad.wmflabs:2379 <nil>}]
W1114 10:09:51.499293       1 clientconn.go:1251] grpc: addrConn.createTransport failed to connect to {toolsbeta-test-k8s-etcd-3.toolsbeta.eqiad.wmflabs:2379 0  <nil>}. Err :connection error: desc = "transport: authentication handshake failed: x509: certificate is valid for toolsbeta-test-k8s-etcd-3.toolsbeta.eqiad.wmflabs, not toolsbeta-test-k8s-etcd-1.toolsbeta.eqiad.wmflabs". Reconnecting...
W1114 10:09:51.504685       1 clientconn.go:1251] grpc: addrConn.createTransport failed to connect to {toolsbeta-test-k8s-etcd-2.toolsbeta.eqiad.wmflabs:2379 0  <nil>}. Err :connection error: desc = "transport: authentication handshake failed: x509: certificate is valid for toolsbeta-test-k8s-etcd-2.toolsbeta.eqiad.wmflabs, not toolsbeta-test-k8s-etcd-1.toolsbeta.eqiad.wmflabs". Reconnecting...
I1114 10:09:51.526277       1 asm_amd64.s:1337] balancerWrapper: got update addr from Notify: [{toolsbeta-test-k8s-etcd-1.toolsbeta.eqiad.wmflabs:2379 <nil>}]
W1114 10:09:51.526342       1 asm_amd64.s:1337] Failed to dial toolsbeta-test-k8s-etcd-3.toolsbeta.eqiad.wmflabs:2379: context canceled; please retry.
W1114 10:09:51.526355       1 asm_amd64.s:1337] Failed to dial toolsbeta-test-k8s-etcd-2.toolsbeta.eqiad.wmflabs:2379: context canceled; please retry.
I1114 10:09:51.530478       1 client.go:354] parsed scheme: ""
I1114 10:09:51.530496       1 client.go:354] scheme "" not registered, fallback to default scheme
I1114 10:09:51.530556       1 asm_amd64.s:1337] ccResolverWrapper: sending new addresses to cc: [{toolsbeta-test-k8s-etcd-1.toolsbeta.eqiad.wmflabs:2379 0  <nil>}]
I1114 10:09:51.530650       1 asm_amd64.s:1337] balancerWrapper: got update addr from Notify: [{toolsbeta-test-k8s-etcd-1.toolsbeta.eqiad.wmflabs:2379 <nil>} {toolsbeta-test-k8s-etcd-2.toolsbeta.eqiad.wmflabs:2379 <nil>} {toolsbeta-test-k8s-etcd-3.toolsbeta.eqiad.wmflabs:2379 <nil>}]
W1114 10:09:51.551212       1 clientconn.go:1251] grpc: addrConn.createTransport failed to connect to {toolsbeta-test-k8s-etcd-2.toolsbeta.eqiad.wmflabs:2379 0  <nil>}. Err :connection error: desc = "transport: authentication handshake failed: x509: certificate is valid for toolsbeta-test-k8s-etcd-2.toolsbeta.eqiad.wmflabs, not toolsbeta-test-k8s-etcd-1.toolsbeta.eqiad.wmflabs". Reconnecting...
W1114 10:09:51.551658       1 clientconn.go:1251] grpc: addrConn.createTransport failed to connect to {toolsbeta-test-k8s-etcd-3.toolsbeta.eqiad.wmflabs:2379 0  <nil>}. Err :connection error: desc = "transport: authentication handshake failed: x509: certificate is valid for toolsbeta-test-k8s-etcd-3.toolsbeta.eqiad.wmflabs, not toolsbeta-test-k8s-etcd-1.toolsbeta.eqiad.wmflabs". Reconnecting...
I1114 10:09:51.567607       1 asm_amd64.s:1337] balancerWrapper: got update addr from Notify: [{toolsbeta-test-k8s-etcd-1.toolsbeta.eqiad.wmflabs:2379 <nil>}]
W1114 10:09:51.567673       1 asm_amd64.s:1337] Failed to dial toolsbeta-test-k8s-etcd-3.toolsbeta.eqiad.wmflabs:2379: context canceled; please retry.
W1114 10:09:51.567694       1 asm_amd64.s:1337] Failed to dial toolsbeta-test-k8s-etcd-2.toolsbeta.eqiad.wmflabs:2379: context canceled; please retry.

Not sure how important those messages are. This can be seen in both new k8s clusters in toolsbeta and tools.

Event Timeline

aborrero moved this task from Inbox to Soon! on the cloud-services-team (Kanban) board.

This seems to be solved by using puppet cert SANs to include all the other servers. This can be done easily via hiera.

Mentioned in SAL (#wikimedia-cloud) [2019-11-25T10:30:14Z] <arturo> add puppet cert SANs via hiera to toolsbeta-test-k8s-etcd nodes (T238655)

Mentioned in SAL (#wikimedia-cloud) [2019-11-25T10:35:27Z] <arturo> add puppet cert SANs via instance hiera to tools-k8s-etcd-[4-6] nodes (T238655)

Mentioned in SAL (#wikimedia-cloud) [2019-11-25T10:35:43Z] <arturo> refresh puppet certs for tools-k8s-etcd-[4-6] nodes (T238655)

aborrero claimed this task.

This seems solved! Please reopen if required.