Page MenuHomePhabricator

toolforge: new k8s: issues with the apiserver and etcd
Closed, ResolvedPublic

Description

Investigate the following error messages:

root@toolsbeta-test-k8s-control-1:~# kubectl logs kube-apiserver-toolsbeta-test-k8s-control-3 -n kube-system
[...]
I1114 10:09:51.479119       1 asm_amd64.s:1337] ccResolverWrapper: sending new addresses to cc: [{toolsbeta-test-k8s-etcd-1.toolsbeta.eqiad.wmflabs:2379 0  <nil>}]
I1114 10:09:51.479241       1 asm_amd64.s:1337] balancerWrapper: got update addr from Notify: [{toolsbeta-test-k8s-etcd-1.toolsbeta.eqiad.wmflabs:2379 <nil>}]
W1114 10:09:51.479286       1 asm_amd64.s:1337] Failed to dial toolsbeta-test-k8s-etcd-3.toolsbeta.eqiad.wmflabs:2379: context canceled; please retry.
W1114 10:09:51.479360       1 asm_amd64.s:1337] Failed to dial toolsbeta-test-k8s-etcd-2.toolsbeta.eqiad.wmflabs:2379: context canceled; please retry.
I1114 10:09:51.479412       1 asm_amd64.s:1337] balancerWrapper: got update addr from Notify: [{toolsbeta-test-k8s-etcd-1.toolsbeta.eqiad.wmflabs:2379 <nil>} {toolsbeta-test-k8s-etcd-2.toolsbeta.eqiad.wmflabs:2379 <nil>} {toolsbeta-test-k8s-etcd-3.toolsbeta.eqiad.wmflabs:2379 <nil>}]
W1114 10:09:51.499293       1 clientconn.go:1251] grpc: addrConn.createTransport failed to connect to {toolsbeta-test-k8s-etcd-3.toolsbeta.eqiad.wmflabs:2379 0  <nil>}. Err :connection error: desc = "transport: authentication handshake failed: x509: certificate is valid for toolsbeta-test-k8s-etcd-3.toolsbeta.eqiad.wmflabs, not toolsbeta-test-k8s-etcd-1.toolsbeta.eqiad.wmflabs". Reconnecting...
W1114 10:09:51.504685       1 clientconn.go:1251] grpc: addrConn.createTransport failed to connect to {toolsbeta-test-k8s-etcd-2.toolsbeta.eqiad.wmflabs:2379 0  <nil>}. Err :connection error: desc = "transport: authentication handshake failed: x509: certificate is valid for toolsbeta-test-k8s-etcd-2.toolsbeta.eqiad.wmflabs, not toolsbeta-test-k8s-etcd-1.toolsbeta.eqiad.wmflabs". Reconnecting...
I1114 10:09:51.526277       1 asm_amd64.s:1337] balancerWrapper: got update addr from Notify: [{toolsbeta-test-k8s-etcd-1.toolsbeta.eqiad.wmflabs:2379 <nil>}]
W1114 10:09:51.526342       1 asm_amd64.s:1337] Failed to dial toolsbeta-test-k8s-etcd-3.toolsbeta.eqiad.wmflabs:2379: context canceled; please retry.
W1114 10:09:51.526355       1 asm_amd64.s:1337] Failed to dial toolsbeta-test-k8s-etcd-2.toolsbeta.eqiad.wmflabs:2379: context canceled; please retry.
I1114 10:09:51.530478       1 client.go:354] parsed scheme: ""
I1114 10:09:51.530496       1 client.go:354] scheme "" not registered, fallback to default scheme
I1114 10:09:51.530556       1 asm_amd64.s:1337] ccResolverWrapper: sending new addresses to cc: [{toolsbeta-test-k8s-etcd-1.toolsbeta.eqiad.wmflabs:2379 0  <nil>}]
I1114 10:09:51.530650       1 asm_amd64.s:1337] balancerWrapper: got update addr from Notify: [{toolsbeta-test-k8s-etcd-1.toolsbeta.eqiad.wmflabs:2379 <nil>} {toolsbeta-test-k8s-etcd-2.toolsbeta.eqiad.wmflabs:2379 <nil>} {toolsbeta-test-k8s-etcd-3.toolsbeta.eqiad.wmflabs:2379 <nil>}]
W1114 10:09:51.551212       1 clientconn.go:1251] grpc: addrConn.createTransport failed to connect to {toolsbeta-test-k8s-etcd-2.toolsbeta.eqiad.wmflabs:2379 0  <nil>}. Err :connection error: desc = "transport: authentication handshake failed: x509: certificate is valid for toolsbeta-test-k8s-etcd-2.toolsbeta.eqiad.wmflabs, not toolsbeta-test-k8s-etcd-1.toolsbeta.eqiad.wmflabs". Reconnecting...
W1114 10:09:51.551658       1 clientconn.go:1251] grpc: addrConn.createTransport failed to connect to {toolsbeta-test-k8s-etcd-3.toolsbeta.eqiad.wmflabs:2379 0  <nil>}. Err :connection error: desc = "transport: authentication handshake failed: x509: certificate is valid for toolsbeta-test-k8s-etcd-3.toolsbeta.eqiad.wmflabs, not toolsbeta-test-k8s-etcd-1.toolsbeta.eqiad.wmflabs". Reconnecting...
I1114 10:09:51.567607       1 asm_amd64.s:1337] balancerWrapper: got update addr from Notify: [{toolsbeta-test-k8s-etcd-1.toolsbeta.eqiad.wmflabs:2379 <nil>}]
W1114 10:09:51.567673       1 asm_amd64.s:1337] Failed to dial toolsbeta-test-k8s-etcd-3.toolsbeta.eqiad.wmflabs:2379: context canceled; please retry.
W1114 10:09:51.567694       1 asm_amd64.s:1337] Failed to dial toolsbeta-test-k8s-etcd-2.toolsbeta.eqiad.wmflabs:2379: context canceled; please retry.

Not sure how important those messages are. This can be seen in both new k8s clusters in toolsbeta and tools.

Event Timeline

aborrero created this task.Tue, Nov 19, 2:06 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptTue, Nov 19, 2:06 PM
aborrero triaged this task as High priority.Tue, Nov 19, 2:07 PM
aborrero moved this task from Inbox to Important on the cloud-services-team (Kanban) board.

This seems to be solved by using puppet cert SANs to include all the other servers. This can be done easily via hiera.

Mentioned in SAL (#wikimedia-cloud) [2019-11-25T10:30:14Z] <arturo> add puppet cert SANs via hiera to toolsbeta-test-k8s-etcd nodes (T238655)

Mentioned in SAL (#wikimedia-cloud) [2019-11-25T10:35:27Z] <arturo> add puppet cert SANs via instance hiera to tools-k8s-etcd-[4-6] nodes (T238655)

Mentioned in SAL (#wikimedia-cloud) [2019-11-25T10:35:43Z] <arturo> refresh puppet certs for tools-k8s-etcd-[4-6] nodes (T238655)

aborrero closed this task as Resolved.Mon, Nov 25, 10:49 AM
aborrero claimed this task.

This seems solved! Please reopen if required.