Over the past week I've noticed SuggestBot's web services fail with an error that the service is not defined. This has previously not been an issue. When I log in to check on lighttpd's status I often find the job to be queued and waiting. Today I remembered to check a bit further and noticed that the job can't be scheduled, per the job status pasted below. Not sure what's going on here, I notice there's quite a few web server execute hosts that are not used because they don't offer enough memory, but at the same time I can't control the memory usage.
Would appreciate if this can be looked into and fixed, or some instructions on how to alleviate the problem if there's something I can do.
tools.suggestbot@tools-bastion-02:~$ qstat -j 5421306
job_number: 5421306
exec_file: job_scripts/5421306
submission_time: Sun Apr 17 15:33:15 2016
owner: tools.suggestbot
uid: 51172
group: tools.suggestbot
gid: 51172
sge_o_home: /data/project/suggestbot
sge_o_log_name: tools.suggestbot
sge_o_path: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
sge_o_shell: /bin/bash
sge_o_workdir: /data/project/suggestbot
sge_o_host: tools-bastion-02
account: sge
stderr_path_list: NONE:NONE:/data/project/suggestbot/error.log
hard resource_list: h_vmem=4g,release=trusty
mail_list: tools.suggestbot@tools.wmflabs.org
notify: FALSE
job_name: lighttpd-suggestbot
stdout_path_list: NONE:NONE:/data/project/suggestbot/error.log
stdin_path_list: NONE:NONE:/dev/null
jobshare: 0
hard_queue_list: webgrid-lighttpd
env_list:
script_file: /usr/local/bin/tool-lighttpd
scheduling info: queue instance "continuous@tools-exec-1206.eqiad.wmflabs" dropped because it is overloaded: np_load_avg=2.485000 (= 2.485000 + 0.50 * 0.000000 with nproc=4) >= 1.75
queue instance "giftbot@tools-exec-gift.eqiad.wmflabs" dropped because it is overloaded: np_load_avg=8.650000 (= 8.650000 + 0.50 * 0.000000 with nproc=2) >= 2.00 queue instance "mailq@tools-exec-1206.eqiad.wmflabs" dropped because it is overloaded: np_load_avg=2.485000 (= 2.485000 + 0.50 * 0.000000 with nproc=4) >= 2.25 queue instance "task@tools-exec-1206.eqiad.wmflabs" dropped because it is overloaded: np_load_avg=2.485000 (= 2.485000 + 0.50 * 0.000000 with nproc=4) >= 1.75 queue instance "webgrid-lighttpd@tools-webgrid-lighttpd-1411.tools.eqiad.wmflabs" dropped because it is overloaded: np_load_avg=3.462500 (= 3.462500 + 0.50 * 0.000000 with nproc=4) >= 2.75 queue instance "webgrid-lighttpd@tools-webgrid-lighttpd-1412.tools.eqiad.wmflabs" dropped because it is overloaded: np_load_avg=3.032500 (= 3.032500 + 0.50 * 0.000000 with nproc=4) >= 2.75 queue instance "webgrid-lighttpd@tools-webgrid-lighttpd-1413.tools.eqiad.wmflabs" dropped because it is overloaded: np_load_avg=2.802500 (= 2.802500 + 0.50 * 0.000000 with nproc=4) >= 2.75 queue instance "webgrid-lighttpd@tools-webgrid-lighttpd-1414.tools.eqiad.wmflabs" dropped because it is overloaded: np_load_avg=2.770000 (= 2.700000 + 0.50 * 0.560000 with nproc=4) >= 2.75 queue instance "webgrid-lighttpd@tools-webgrid-lighttpd-1401.eqiad.wmflabs" dropped because it is overloaded: np_load_avg=6.300000 (= 6.300000 + 0.50 * 0.000000 with nproc=4) >= 2.75 queue instance "webgrid-lighttpd@tools-webgrid-lighttpd-1415.tools.eqiad.wmflabs" dropped because it is disabled cannot run in queue "cyberbot" because it is not contained in its hard queue list (-q) cannot run in queue "webgrid-generic" because it is not contained in its hard queue list (-q) (-l h_vmem=4g,release=trusty) cannot run at host "tools-webgrid-lighttpd-1206.eqiad.wmflabs" because it offers only hf:release=precise (-l h_vmem=4g,release=trusty) cannot run at host "tools-webgrid-lighttpd-1207.eqiad.wmflabs" because it offers only hf:release=precise cannot run in queue "mailq" because it is not contained in its hard queue list (-q) cannot run in queue "task" because it is not contained in its hard queue list (-q) cannot run in queue "continuous" because it is not contained in its hard queue list (-q) (-l h_vmem=4g,release=trusty) cannot run at host "tools-webgrid-lighttpd-1203.eqiad.wmflabs" because it offers only hf:release=precise (-l h_vmem=4g,release=trusty) cannot run at host "tools-webgrid-lighttpd-1205.eqiad.wmflabs" because it offers only hf:release=precise (-l h_vmem=4g,release=trusty) cannot run at host "tools-webgrid-lighttpd-1407.eqiad.wmflabs" because it offers only hc:h_vmem=3.142G (-l h_vmem=4g,release=trusty) cannot run at host "tools-webgrid-lighttpd-1402.eqiad.wmflabs" because it offers only hc:h_vmem=3.142G (-l h_vmem=4g,release=trusty) cannot run at host "tools-webgrid-lighttpd-1202.eqiad.wmflabs" because it offers only hf:release=precise (-l h_vmem=4g,release=trusty) cannot run at host "tools-webgrid-lighttpd-1404.eqiad.wmflabs" because it offers only hc:h_vmem=3.142G (-l h_vmem=4g,release=trusty) cannot run at host "tools-webgrid-lighttpd-1201.eqiad.wmflabs" because it offers only hf:release=precise (-l h_vmem=4g,release=trusty) cannot run at host "tools-webgrid-lighttpd-1408.eqiad.wmflabs" because it offers only hc:h_vmem=3.142G (-l h_vmem=4g,release=trusty) cannot run at host "tools-webgrid-lighttpd-1208.eqiad.wmflabs" because it offers only hf:release=precise (-l h_vmem=4g,release=trusty) cannot run at host "tools-webgrid-lighttpd-1209.eqiad.wmflabs" because it offers only hf:release=precise (-l h_vmem=4g,release=trusty) cannot run at host "tools-webgrid-lighttpd-1210.eqiad.wmflabs" because it offers only hf:release=precise (-l h_vmem=4g,release=trusty) cannot run at host "tools-webgrid-lighttpd-1409.eqiad.wmflabs" because it offers only hc:h_vmem=3.142G (-l h_vmem=4g,release=trusty) cannot run at host "tools-webgrid-lighttpd-1403.eqiad.wmflabs" because it offers only hc:h_vmem=3.142G (-l h_vmem=4g,release=trusty) cannot run at host "tools-webgrid-lighttpd-1204.eqiad.wmflabs" because it offers only hf:release=precise (-l h_vmem=4g,release=trusty) cannot run at host "tools-webgrid-lighttpd-1410.eqiad.wmflabs" because it offers only hc:h_vmem=3.142G (-l h_vmem=4g,release=trusty) cannot run at host "tools-webgrid-lighttpd-1405.eqiad.wmflabs" because it offers only hc:h_vmem=3.142G (-l h_vmem=4g,release=trusty) cannot run at host "tools-webgrid-lighttpd-1406.eqiad.wmflabs" because it offers only hc:h_vmem=3.142G