Ubuntu 12.04.5 LTS tools-exec-1207 ttyS0 tools-exec-1207 login: [769561.116084] INFO: task mono-sgen:23722 blocked for more than 120 seconds. [769561.123011] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [769561.125403] INFO: task mono-sgen:23724 blocked for more than 120 seconds. [769561.126359] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [769561.127463] INFO: task mono-sgen:23725 blocked for more than 120 seconds. [769561.128364] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [769561.129375] INFO: task mono-sgen:23726 blocked for more than 120 seconds. [769561.130109] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [769561.130996] INFO: task mono-sgen:23727 blocked for more than 120 seconds. [769561.131919] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [769561.132984] INFO: task mono-sgen:23729 blocked for more than 120 seconds. [769561.134287] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [769561.135245] INFO: task mono-sgen:23736 blocked for more than 120 seconds. [769561.136112] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [769561.137186] INFO: task mono-sgen:23794 blocked for more than 120 seconds. [769561.138077] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [769681.137028] INFO: task mono-sgen:23722 blocked for more than 120 seconds. [769681.145458] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [769681.146831] INFO: task mono-sgen:23724 blocked for more than 120 seconds. [769681.147731] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Description
Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | None | T124133 NFS overload is causing instances to freeze | |||
Resolved | valhallasw | T136481 tools-exec-1207 hanging |
Event Timeline
Comment Actions
16:45 <Kelson> valhallasw`cloud: I gess the node went out-of-memoy and probably in a freeze 16:45 <valhallasw`cloud> why do you think so? 16:46 <Kelson> valhallasw`cloud: because I get an error in the job log about "out of memory"
For now, I'm killing/rescheduling all jobs on that host. @chasemp, do you want to investigate the deeper cause or shall we just reboot the host?
valhallasw@tools-bastion-02:/data/project/enwp10$ qhost -j -h tools-exec-1207.eqiad.wmflabs HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO SWAPUS ------------------------------------------------------------------------------- global - - - - - - - tools-exec-1207.eqiad.wmflabs lx26-amd64 4 - 7.8G - 23.9G - job-ID prior name user state submit/start at queue master ja-task-ID ---------------------------------------------------------------------------------------------- 2562456 0.53512 enwiki_upd tools.xtools Rr 05/25/2016 17:35:32 continuous MASTER 5998409 0.34276 rmiw.w3 tools.yifeib r 05/05/2016 09:51:06 continuous MASTER 6454040 0.31875 comsign tools.yifeib r 05/18/2016 10:35:17 continuous MASTER 6822525 0.30121 welcome tools.dimast r 05/27/2016 23:00:18 continuous MASTER 6668316 0.30886 wp10-selec tools.enwp10 dr 05/23/2016 19:26:31 task@tools MASTER 6839085 0.30037 rdallvoy tools.avicbo r 05/28/2016 10:01:13 task@tools MASTER valhallasw@tools-bastion-02:/data/project/enwp10$ qdel -f 6668316 6839085 warning: valhallasw forced the deletion of job 6668316 warning: valhallasw forced the deletion of job 6839085 valhallasw@tools-bastion-02:/data/project/enwp10$ qmod -rj 2562456 5998409 6454040 6822525 Pushed rescheduling of job 2562456 on host tools-exec-1207.eqiad.wmflabs Pushed rescheduling of job 5998409 on host tools-exec-1207.eqiad.wmflabs Pushed rescheduling of job 6454040 on host tools-exec-1207.eqiad.wmflabs Pushed rescheduling of job 6822525 on host tools-exec-1207.eqiad.wmflabs
The host is now empty.
@Kelson, the wp10-select task was force-deleted;
@Avicennasis, the rdallvoy task was also force-deleted. Please resubmit the task if it should be run again.
Comment Actions
I would reboot this for now, fairly comfortable saying this is likely nfs
maint fallout. Thanks