Page MenuHomePhabricator

Troubleshoot rdf-streaming-updater/dse-k8s cluster
Closed, ResolvedPublic

Description

While attempting to restore a savepoint in T345957, we noticed that the rdf-streaming-updater application hangs. Further exploration reveals that while the k8s workers themselves can resolve DNS, containers running on these workers cannot.

Creating this ticket to troubleshoot and (hopefully) fix this issue.

Event Timeline

Update: After some consultation in #wikimedia-k8s-sig , this doesn't seem to be a DNS issue. So it's most likely firewall rules...will continue troubleshooting and get back.

Change 956474 had a related patch set uploaded (by Bking; author: Bking):

[operations/deployment-charts@master] rdf-streaming-updater-k8s: Add egress rules to values

https://gerrit.wikimedia.org/r/956474

Change 955032 had a related patch set uploaded (by Bking; author: Ebernhardson):

[operations/deployment-charts@master] Add a networkpolicy template for zookeeper

https://gerrit.wikimedia.org/r/955032

Change 955032 merged by jenkins-bot:

[operations/deployment-charts@master] Add a networkpolicy template for zookeeper

https://gerrit.wikimedia.org/r/955032

Change 956474 merged by jenkins-bot:

[operations/deployment-charts@master] rdf-streaming-updater-k8s: Add egress/proxy rules to values

https://gerrit.wikimedia.org/r/956474

Patch above fixed the firewall rules, and we were able to get the flink-app to restore from savepoint. Closing this, but work continues in T345957 .