I just created two new sge nodes, tools-sgeexec-10-11/12. They both are showing similar errors when trying to talk to the gridmaster:
Jun 01 19:03:17 tools-sgeexec-10-11 gridengine-exec[2094]: error: got send timeout Jun 01 19:03:17 tools-sgeexec-10-11 gridengine-exec[2094]: error: can't get configuration from qmaster -- backgrounding Jun 01 19:03:17 tools-sgeexec-10-11 gridengine-exec[2094]: critical error: unable to write to file fd_pipe[1]: Broken pipe Jun 01 19:03:17 tools-sgeexec-10-11 sge_execd[2234]: main|tools-sgeexec-10-11|E|got send timeout Jun 01 19:03:17 tools-sgeexec-10-11 sge_execd[2234]: main|tools-sgeexec-10-11|E|can't get configuration from qmaster -- backgrounding Jun 01 19:03:17 tools-sgeexec-10-11 sge_execd[2234]: main|tools-sgeexec-10-11|C|unable to write to file fd_pipe[1]: Broken pipe Jun 01 19:04:20 tools-sgeexec-10-11 sge_execd[2234]: main|tools-sgeexec-10-11|E|getting configuration: unable to send message to qmaster using port 6444 on host "tools-sgegrid-master.tools.eqiad1.wikimed
This persists after service restarts and reboots. Tcpdumping on the grid master (tcpdump "port 6444"|grep tools-sgeexec-10-1) shows traffic from a working node (sgeexec-10-10) but doesn't show any traffic from the new nodes. tools-sgeexec-10-11 and -12 are in the correct 'execnode' security group.