Page MenuHomePhabricator

jdk pod not starting + stale file handle
Closed, ResolvedPublic

Description

I have been trying several times to start a jdk service, with:

webservice --backend=kubernetes jdk8 shell

and also with:

webservice --backend=kubernetes jdk11 shell

receiving the error: "Pod is not ready in time".

After that, I have tried to see it there was some error, using the commands suggested in https://wikitech.wikimedia.org/wiki/Help:Toolforge/Kubernetes#Monitoring_your_jobs

After that, I am getting the error "Stale NFS file handle"

I have logged out, and now I cannot even become my tool.

Username: benjavalero
Tool name: replacer

Thanks in advance,

Event Timeline

I have just tried again and I can now start again the container. However it is taking a lot for the tool to start.

I am taking a look into https://tools.wmflabs.org/k8s-status/namespaces/tool-replacer/ before closing the issue.

I have just stopped it, and started it again. I get again the error: "Pod is not ready in time".

Any suggestion?

I am able to start the jdk8 container. Then I launch the tool the same way as always. But the tool is not available:

https://tools.wmflabs.org/replacer/

Usually the tool, once started, even when it fails, writes logs in a logs folder, but this is not happening now.

Assuming this is about Toolforge

Indeed, it is a tool running in Toolforge.

I have finally achieved to start the tool, but it has taken a long time to start. Once started, it seems to work more or less well.

My steps are:

  • Log into login.tools.wmflabs.org by SSH with my user benjavalero
  • become replacer
  • Launch, as usual, a script webservice --backend=kubernetes jdk8 start /data/project/replacer/replacer/run.sh which executes a JAR (SpringBoot)

This process has been working for months.

bd808 claimed this task.
bd808 added a subscriber: bd808.

Tool is running and has been moved to the 2020 Kubernetes cluster so behaviors are likely to be different going forward.

There are a few things that may make a jdk8/jdk10 tool start a bit slower than say a php7.3 tool:

  • there are relatively few of these tools, so it is more likely that the local cache for the Docker image is cold and a full download is needed from the registry
  • Jar files can be large compared to a typical PHP or Python script and their reading off of the NFS server may be throttled depending on the other activity happening on the Kubernetes worker node where the Pod is deployed
  • java has a non-trivial startup time generally