/var/lib/gridengine/spool/qmaster/messages is full with:
scfc@tools-bastion-03:~$ tail /var/lib/gridengine/spool/qmaster/messages 12/06/2016 02:55:42|schedu|tools-grid-master|E|unable to find job 760793 from the scheduler order package 12/06/2016 02:55:48|worker|tools-grid-master|E|execd@tools-webgrid-lighttpd-1208.eqiad.wmflabs reports running job (4594249.1/master) in queue "webgrid-lighttpd@tools-webgrid-lighttpd-1208.eqiad.wmflabs" that was not supposed to be there - killing 12/06/2016 02:56:16|worker|tools-grid-master|W|unable to find job 760804 from the scheduler order package 12/06/2016 02:56:16|worker|tools-grid-master|W|Skipping remaining 0 orders 12/06/2016 02:56:16|schedu|tools-grid-master|E|unable to find job 760804 from the scheduler order package 12/06/2016 02:56:17|worker|tools-grid-master|W|unable to find job 760805 from the scheduler order package 12/06/2016 02:56:17|worker|tools-grid-master|W|Skipping remaining 0 orders 12/06/2016 02:56:17|schedu|tools-grid-master|E|unable to find job 760805 from the scheduler order package 12/06/2016 02:56:17|worker|tools-grid-master|E|got load report of unknown exec host "tools-exec-1204.eqiad.wmflabs" 12/06/2016 02:56:28|worker|tools-grid-master|E|execd@tools-webgrid-lighttpd-1208.eqiad.wmflabs reports running job (4594249.1/master) in queue "webgrid-lighttpd@tools-webgrid-lighttpd-1208.eqiad.wmflabs" that was not supposed to be there - killing scfc@tools-bastion-03:~$
So the gridengine master seems to need to learn to discard those messages instead of reexamining them every few seconds.
I believe we had a similar situation in the past, and IIRC then @valhallasw looked up the necessary commands to solve that. @valhallasw, am I remembering correctly? Do you still know what you did?