Page MenuHomePhabricator

several stucks of tools-bastion-03
Closed, ResolvedPublic

Description

At the moment tools-bastion-03 is down. No connection before or during authenticating with public key.

There have been several stucks and down times since about June 4th.
Please find out the reason, bringing it up only will not be enough.

Thanks

Event Timeline

doctaxon created this task.Jun 13 2016, 8:25 AM
Restricted Application added a project: Cloud-Services. · View Herald TranscriptJun 13 2016, 8:25 AM
Restricted Application added subscribers: Zppix, Luke081515, Aklapper. · View Herald Transcript

Connection hangs at

debug1: SSH2_MSG_NEWKEYS sent
debug1: expecting SSH2_MSG_NEWKEYS
debug1: SSH2_MSG_NEWKEYS received
debug1: SSH2_MSG_SERVICE_REQUEST sent
debug1: SSH2_MSG_SERVICE_ACCEPT received
debug1: Authentications that can continue: publickey,keyboard-interactive,hostbased
debug1: Next authentication method: publickey
debug1: Offering RSA public key: /home/valhallasw/.ssh/id_rsa

Kernel log from wikitech:

tools-bastion-03 login: [305640.592078] INFO: task jbd2/vda1-8:201 blocked for more than 120 seconds.
[305640.595184]       Not tainted 3.13.0-83-generic #127-Ubuntu
[305640.595915] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[305760.596054] INFO: task jbd2/vda1-8:201 blocked for more than 120 seconds.
[305760.599599]       Not tainted 3.13.0-83-generic #127-Ubuntu
[305760.600281] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[305880.600056] INFO: task jbd2/vda1-8:201 blocked for more than 120 seconds.
[305880.602848]       Not tainted 3.13.0-83-generic #127-Ubuntu
[305880.603473] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[306000.604052] INFO: task jbd2/vda1-8:201 blocked for more than 120 seconds.
[306000.607019]       Not tainted 3.13.0-83-generic #127-Ubuntu
[306000.607701] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[306120.608113] INFO: task jbd2/vda1-8:201 blocked for more than 120 seconds.
[306120.610817]       Not tainted 3.13.0-83-generic #127-Ubuntu
[306120.611470] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[306240.612059] INFO: task jbd2/vda1-8:201 blocked for more than 120 seconds.
[306240.614982]       Not tainted 3.13.0-83-generic #127-Ubuntu
[306240.615610] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[306360.616107] INFO: task jbd2/vda1-8:201 blocked for more than 120 seconds.
[306360.618520]       Not tainted 3.13.0-83-generic #127-Ubuntu
[306360.619208] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[306480.620041] INFO: task jbd2/vda1-8:201 blocked for more than 120 seconds.
[306480.623246]       Not tainted 3.13.0-83-generic #127-Ubuntu
[306480.623909] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[306600.624058] INFO: task jbd2/vda1-8:201 blocked for more than 120 seconds.
[306600.626912]       Not tainted 3.13.0-83-generic #127-Ubuntu
[306600.627646] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[306720.628081] INFO: task jbd2/vda1-8:201 blocked for more than 120 seconds.
[306720.630476]       Not tainted 3.13.0-83-generic #127-Ubuntu
[306720.631123] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

So presumably another instance of T124133: NFS overload is causing instances to freeze.

Server load is consistent with an NFS failure:

I have rebooted the host.

valhallasw closed this task as Resolved.Jun 13 2016, 9:19 AM
valhallasw claimed this task.

The host is back on-line.