Page MenuHomePhabricator

es2018 crashed
Closed, ResolvedPublic

Event Timeline

Change 393211 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/mediawiki-config@master] maridb: depool es2018 after crash

https://gerrit.wikimedia.org/r/393211

The idrac console showed when I logged in:

[30996087.770298] megaraid_sas 0000:03:00.0: pending commands remain after waiting, will reset adapter scsi0.
[30996102.596599] megaraid_sas 0000:03:00.0: Init cmd success

dmesg:

[Fri Nov 24 10:14:53 2017] TCP: request_sock_TCP: Possible SYN flooding on port 5666. Sending cookies.  Check SNMP counters.
[Fri Nov 24 10:17:06 2017] INFO: task jbd2/sda1-8:934 blocked for more than 120 seconds.
[Fri Nov 24 10:17:06 2017]       Not tainted 4.4.0-3-amd64 #1
[Fri Nov 24 10:17:06 2017] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Fri Nov 24 10:17:06 2017] jbd2/sda1-8     D ffff88203eb15d80     0   934      2 0x00000000
[Fri Nov 24 10:17:06 2017]  ffff8820334caf40 ffff881038566380 ffff882036120000 ffff88203611fb00
[Fri Nov 24 10:17:06 2017]  7fffffffffffffff ffffffff81593490 ffff88203611fb80 0000000000000852
[Fri Nov 24 10:17:06 2017]  ffffffff81592c11 0000000000000000 ffffffff81595ba5 7fffffffffffffff
[Fri Nov 24 10:17:06 2017] Call Trace:
[Fri Nov 24 10:17:06 2017]  [<ffffffff81593490>] ? bit_wait_timeout+0xa0/0xa0
[Fri Nov 24 10:17:06 2017]  [<ffffffff81592c11>] ? schedule+0x31/0x80
[Fri Nov 24 10:17:06 2017]  [<ffffffff81595ba5>] ? schedule_timeout+0x235/0x2d0
[Fri Nov 24 10:17:06 2017]  [<ffffffff812bf22b>] ? queue_unplugged+0x9b/0xa0
[Fri Nov 24 10:17:06 2017]  [<ffffffff81593490>] ? bit_wait_timeout+0xa0/0xa0
[Fri Nov 24 10:17:06 2017]  [<ffffffff81592204>] ? io_schedule_timeout+0xb4/0x130
[Fri Nov 24 10:17:06 2017]  [<ffffffff810b82c5>] ? prepare_to_wait+0x55/0x80
[Fri Nov 24 10:17:06 2017]  [<ffffffff815934a7>] ? bit_wait_io+0x17/0x60
[Fri Nov 24 10:17:06 2017]  [<ffffffff81592f8a>] ? __wait_on_bit+0x5a/0x90
[Fri Nov 24 10:17:06 2017]  [<ffffffff81169591>] ? wait_on_page_bit+0xc1/0xe0
[Fri Nov 24 10:17:06 2017]  [<ffffffff810b8630>] ? autoremove_wake_function+0x40/0x40
[Fri Nov 24 10:17:06 2017]  [<ffffffff81169687>] ? __filemap_fdatawait_range+0xd7/0x150
[Fri Nov 24 10:17:06 2017]  [<ffffffff812c36cf>] ? submit_bio+0x6f/0x170
[Fri Nov 24 10:17:06 2017]  [<ffffffff8116970f>] ? filemap_fdatawait_range+0xf/0x30
[Fri Nov 24 10:17:06 2017]  [<ffffffffa0105d45>] ? jbd2_journal_commit_transaction+0xd15/0x1900 [jbd2]
[Fri Nov 24 10:17:06 2017]  [<ffffffff810acb02>] ? dequeue_entity+0x3f2/0x920
[Fri Nov 24 10:17:06 2017]  [<ffffffff810ad203>] ? put_prev_entity+0x33/0x710
[Fri Nov 24 10:17:06 2017]  [<ffffffff810dd6d9>] ? try_to_del_timer_sync+0x59/0x80
[Fri Nov 24 10:17:06 2017]  [<ffffffffa010a1cd>] ? kjournald2+0xdd/0x280 [jbd2]
[Fri Nov 24 10:17:06 2017]  [<ffffffff810b85f0>] ? wait_woken+0x90/0x90
[Fri Nov 24 10:17:06 2017]  [<ffffffffa010a0f0>] ? commit_timeout+0x10/0x10 [jbd2]
[Fri Nov 24 10:17:06 2017]  [<ffffffff81096ebf>] ? kthread+0xdf/0x100
[Fri Nov 24 10:17:06 2017]  [<ffffffff81096de0>] ? kthread_park+0x50/0x50
[Fri Nov 24 10:17:06 2017]  [<ffffffff81596d5f>] ? ret_from_fork+0x3f/0x70
[Fri Nov 24 10:17:06 2017]  [<ffffffff81096de0>] ? kthread_park+0x50/0x50
[Fri Nov 24 10:17:06 2017] INFO: task xfsaild/dm-0:1231 blocked for more than 120 seconds.
[Fri Nov 24 10:17:06 2017]       Not tainted 4.4.0-3-amd64 #1
[Fri Nov 24 10:17:06 2017] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Fri Nov 24 10:17:06 2017] xfsaild/dm-0    D ffff88203ead5d80     0  1231      2 0x00000000
[Fri Nov 24 10:17:06 2017]  ffff8810336c6240 ffff881038558300 ffff881035b88000 0000000000000000
[Fri Nov 24 10:17:06 2017]  ffff8820336eac00 ffff8820324b4800 ffff8820324b4928 ffff88202f55c800
[Fri Nov 24 10:17:06 2017]  ffffffff81592c11 0000000000000001 ffffffffa05fef94 00000002cdeb6cad
[Fri Nov 24 10:17:06 2017] Call Trace:
[Fri Nov 24 10:17:06 2017]  [<ffffffff81592c11>] ? schedule+0x31/0x80
[Fri Nov 24 10:17:06 2017]  [<ffffffffa05fef94>] ? _xfs_log_force+0x164/0x2d0 [xfs]
[Fri Nov 24 10:17:06 2017]  [<ffffffff810a2110>] ? wake_up_q+0x60/0x60
[Fri Nov 24 10:17:06 2017]  [<ffffffffa05ff121>] ? xfs_log_force+0x21/0x90 [xfs]
[Fri Nov 24 10:17:06 2017]  [<ffffffffa0609c37>] ? xfsaild+0x197/0x740 [xfs]
[Fri Nov 24 10:17:06 2017]  [<ffffffffa0609aa0>] ? xfs_trans_ail_cursor_first+0x80/0x80 [xfs]
[Fri Nov 24 10:17:06 2017]  [<ffffffff81096ebf>] ? kthread+0xdf/0x100
[Fri Nov 24 10:17:06 2017]  [<ffffffff81096de0>] ? kthread_park+0x50/0x50
[Fri Nov 24 10:17:06 2017]  [<ffffffff81596d5f>] ? ret_from_fork+0x3f/0x70
[Fri Nov 24 10:17:06 2017]  [<ffffffff81096de0>] ? kthread_park+0x50/0x50
[Fri Nov 24 10:17:06 2017] INFO: task mysqld:2902 blocked for more than 120 seconds.
[Fri Nov 24 10:17:06 2017]       Not tainted 4.4.0-3-amd64 #1
[Fri Nov 24 10:17:06 2017] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Fri Nov 24 10:17:06 2017] mysqld          D ffff88103f255d80     0  2902   2071 0x00000000
[Fri Nov 24 10:17:06 2017]  ffff881032c20e80 ffff88103852cf40 ffff8810357d0000 ffff8810357cfb08
[Fri Nov 24 10:17:06 2017]  7fffffffffffffff ffff881032c20e80 ffff881032c20e80 ffff880108636a40
[Fri Nov 24 10:17:06 2017]  ffffffff81592c11 0000000000000000 ffffffff81595ba5 7fffffffffffffff
[Fri Nov 24 10:17:06 2017] Call Trace:
[Fri Nov 24 10:17:06 2017]  [<ffffffff81592c11>] ? schedule+0x31/0x80
[Fri Nov 24 10:17:06 2017]  [<ffffffff81595ba5>] ? schedule_timeout+0x235/0x2d0
[Fri Nov 24 10:17:06 2017]  [<ffffffff812bef6f>] ? __blk_run_queue+0x2f/0x40
[Fri Nov 24 10:17:06 2017]  [<ffffffff812bf1b5>] ? queue_unplugged+0x25/0xa0
[Fri Nov 24 10:17:06 2017]  [<ffffffff81592204>] ? io_schedule_timeout+0xb4/0x130
[Fri Nov 24 10:17:06 2017]  [<ffffffff812189c8>] ? do_blockdev_direct_IO+0x1b38/0x2bf0
[Fri Nov 24 10:17:06 2017]  [<ffffffffa05d7080>] ? xfs_get_blocks+0x10/0x10 [xfs]
[Fri Nov 24 10:17:06 2017]  [<ffffffffa05d657c>] ? xfs_vm_direct_IO+0x6c/0xe0 [xfs]
[Fri Nov 24 10:17:06 2017]  [<ffffffffa05d58c0>] ? xfs_submit_ioend+0x120/0x120 [xfs]
[Fri Nov 24 10:17:06 2017]  [<ffffffffa05e319e>] ? xfs_file_dio_aio_write+0x1ae/0x360 [xfs]
[Fri Nov 24 10:17:06 2017]  [<ffffffffa05e3660>] ? xfs_file_write_iter+0xa0/0x170 [xfs]
[Fri Nov 24 10:17:06 2017]  [<ffffffff811dae0d>] ? __vfs_write+0xcd/0x120
[Fri Nov 24 10:17:06 2017]  [<ffffffff811db464>] ? vfs_write+0xa4/0x190
[Fri Nov 24 10:17:06 2017]  [<ffffffff811dc386>] ? SyS_pwrite64+0x86/0xb0
[Fri Nov 24 10:17:06 2017]  [<ffffffff815969b6>] ? system_call_fast_compare_end+0xc/0x6b
[Fri Nov 24 10:17:06 2017] INFO: task mysqld:123754 blocked for more than 120 seconds.
[Fri Nov 24 10:17:06 2017]       Not tainted 4.4.0-3-amd64 #1
[Fri Nov 24 10:17:06 2017] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Fri Nov 24 10:17:06 2017] mysqld          D ffff88103f315d80     0 123754   2071 0x00000000
[Fri Nov 24 10:17:06 2017]  ffff8811f5714e00 ffff8810385670c0 ffff8817e225c000 ffff8817e225bd00
[Fri Nov 24 10:17:06 2017]  7fffffffffffffff ffffffff81593490 ffff8817e225bd80 0007ffffffffffff
[Fri Nov 24 10:17:06 2017]  ffffffff81592c11 0000000000000000 ffffffff81595ba5 7fffffffffffffff
[Fri Nov 24 10:17:06 2017] Call Trace:
[Fri Nov 24 10:17:06 2017]  [<ffffffff81593490>] ? bit_wait_timeout+0xa0/0xa0
[Fri Nov 24 10:17:06 2017]  [<ffffffff81592c11>] ? schedule+0x31/0x80
[Fri Nov 24 10:17:06 2017]  [<ffffffff81595ba5>] ? schedule_timeout+0x235/0x2d0
[Fri Nov 24 10:17:06 2017]  [<ffffffff812c49f6>] ? blk_peek_request+0x46/0x260
[Fri Nov 24 10:17:06 2017]  [<ffffffffa001624d>] ? scsi_request_fn+0x3d/0x5f0 [scsi_mod]
[Fri Nov 24 10:17:06 2017]  [<ffffffff81593490>] ? bit_wait_timeout+0xa0/0xa0
[Fri Nov 24 10:17:06 2017]  [<ffffffff81592204>] ? io_schedule_timeout+0xb4/0x130
[Fri Nov 24 10:17:06 2017]  [<ffffffff810b82c5>] ? prepare_to_wait+0x55/0x80
[Fri Nov 24 10:17:06 2017]  [<ffffffff815934a7>] ? bit_wait_io+0x17/0x60
[Fri Nov 24 10:17:06 2017]  [<ffffffff81592f8a>] ? __wait_on_bit+0x5a/0x90
[Fri Nov 24 10:17:06 2017]  [<ffffffff81169591>] ? wait_on_page_bit+0xc1/0xe0
[Fri Nov 24 10:17:06 2017]  [<ffffffff810b8630>] ? autoremove_wake_function+0x40/0x40
[Fri Nov 24 10:17:06 2017]  [<ffffffff81169687>] ? __filemap_fdatawait_range+0xd7/0x150
[Fri Nov 24 10:17:06 2017]  [<ffffffff8116b41f>] ? __filemap_fdatawrite_range+0xcf/0x100
[Fri Nov 24 10:17:06 2017]  [<ffffffff8116970f>] ? filemap_fdatawait_range+0xf/0x30
[Fri Nov 24 10:17:06 2017]  [<ffffffff8116b52b>] ? filemap_write_and_wait_range+0x3b/0x60
[Fri Nov 24 10:17:06 2017]  [<ffffffffa05e2311>] ? xfs_file_fsync+0x61/0x220 [xfs]
[Fri Nov 24 10:17:06 2017]  [<ffffffff8120d428>] ? do_fsync+0x38/0x60
[Fri Nov 24 10:17:06 2017]  [<ffffffff8120d69c>] ? SyS_fsync+0xc/0x10
[Fri Nov 24 10:17:06 2017]  [<ffffffff815969b6>] ? system_call_fast_compare_end+0xc/0x6b
[Fri Nov 24 10:17:06 2017] INFO: task mysqld:190154 blocked for more than 120 seconds.
[Fri Nov 24 10:17:06 2017]       Not tainted 4.4.0-3-amd64 #1
[Fri Nov 24 10:17:06 2017] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Fri Nov 24 10:17:06 2017] mysqld          D ffff88103f295d80     0 190154   2071 0x00000000
[Fri Nov 24 10:17:06 2017]  ffff88203418eec0 ffff881038544fc0 ffff88108bbfc000 ffff88108bbfbd00
[Fri Nov 24 10:17:06 2017]  7fffffffffffffff ffffffff81593490 ffff88108bbfbd80 0007ffffffffffff
[Fri Nov 24 10:17:06 2017]  ffffffff81592c11 0000000000000000 ffffffff81595ba5 7fffffffffffffff
[Fri Nov 24 10:17:06 2017] Call Trace:
[Fri Nov 24 10:17:06 2017]  [<ffffffff81593490>] ? bit_wait_timeout+0xa0/0xa0
[Fri Nov 24 10:17:06 2017]  [<ffffffff81592c11>] ? schedule+0x31/0x80
[Fri Nov 24 10:17:06 2017]  [<ffffffff81595ba5>] ? schedule_timeout+0x235/0x2d0
[Fri Nov 24 10:17:06 2017]  [<ffffffff812c49f6>] ? blk_peek_request+0x46/0x260
[Fri Nov 24 10:17:06 2017]  [<ffffffffa001624d>] ? scsi_request_fn+0x3d/0x5f0 [scsi_mod]
[Fri Nov 24 10:17:06 2017]  [<ffffffff81593490>] ? bit_wait_timeout+0xa0/0xa0
[Fri Nov 24 10:17:06 2017]  [<ffffffff81592204>] ? io_schedule_timeout+0xb4/0x130
[Fri Nov 24 10:17:06 2017]  [<ffffffff810b82c5>] ? prepare_to_wait+0x55/0x80
[Fri Nov 24 10:17:06 2017]  [<ffffffff815934a7>] ? bit_wait_io+0x17/0x60
[Fri Nov 24 10:17:06 2017]  [<ffffffff81592f8a>] ? __wait_on_bit+0x5a/0x90
[Fri Nov 24 10:17:06 2017]  [<ffffffff81169591>] ? wait_on_page_bit+0xc1/0xe0
[Fri Nov 24 10:17:06 2017]  [<ffffffff810b8630>] ? autoremove_wake_function+0x40/0x40
[Fri Nov 24 10:17:06 2017]  [<ffffffff81169687>] ? __filemap_fdatawait_range+0xd7/0x150
[Fri Nov 24 10:17:06 2017]  [<ffffffff8116b41f>] ? __filemap_fdatawrite_range+0xcf/0x100
[Fri Nov 24 10:17:06 2017]  [<ffffffff8116970f>] ? filemap_fdatawait_range+0xf/0x30
[Fri Nov 24 10:17:06 2017]  [<ffffffff8116b52b>] ? filemap_write_and_wait_range+0x3b/0x60
[Fri Nov 24 10:17:06 2017]  [<ffffffffa05e2311>] ? xfs_file_fsync+0x61/0x220 [xfs]
[Fri Nov 24 10:17:06 2017]  [<ffffffff8120d428>] ? do_fsync+0x38/0x60
[Fri Nov 24 10:17:06 2017]  [<ffffffff8120d69c>] ? SyS_fsync+0xc/0x10
[Fri Nov 24 10:17:06 2017]  [<ffffffff815969b6>] ? system_call_fast_compare_end+0xc/0x6b
[Fri Nov 24 10:17:06 2017] INFO: task rs:main Q:Reg:167518 blocked for more than 120 seconds.
[Fri Nov 24 10:17:06 2017]       Not tainted 4.4.0-3-amd64 #1
[Fri Nov 24 10:17:06 2017] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Fri Nov 24 10:17:06 2017] rs:main Q:Reg   D ffff88203ea55d80     0 167518      1 0x00000000
[Fri Nov 24 10:17:06 2017]  ffff88010638afc0 ffff88103852c200 ffff881087a8c000 ffff881087a8ba48
[Fri Nov 24 10:17:06 2017]  7fffffffffffffff ffffffff81593490 ffff881087a8bac8 ffff881625f9c5d8
[Fri Nov 24 10:17:06 2017]  ffffffff81592c11 0000000000000000 ffffffff81595ba5 7fffffffffffffff
[Fri Nov 24 10:17:06 2017] Call Trace:
[Fri Nov 24 10:17:06 2017]  [<ffffffff81593490>] ? bit_wait_timeout+0xa0/0xa0
[Fri Nov 24 10:17:06 2017]  [<ffffffff81592c11>] ? schedule+0x31/0x80
[Fri Nov 24 10:17:06 2017]  [<ffffffff81595ba5>] ? schedule_timeout+0x235/0x2d0
[Fri Nov 24 10:17:06 2017]  [<ffffffff81593490>] ? bit_wait_timeout+0xa0/0xa0
[Fri Nov 24 10:17:06 2017]  [<ffffffff81592204>] ? io_schedule_timeout+0xb4/0x130
[Fri Nov 24 10:17:06 2017]  [<ffffffff810b82c5>] ? prepare_to_wait+0x55/0x80
[Fri Nov 24 10:17:06 2017]  [<ffffffff815934a7>] ? bit_wait_io+0x17/0x60
[Fri Nov 24 10:17:06 2017]  [<ffffffff81592f8a>] ? __wait_on_bit+0x5a/0x90
[Fri Nov 24 10:17:06 2017]  [<ffffffff81593490>] ? bit_wait_timeout+0xa0/0xa0
[Fri Nov 24 10:17:06 2017]  [<ffffffff8159303e>] ? out_of_line_wait_on_bit+0x7e/0xa0
[Fri Nov 24 10:17:06 2017]  [<ffffffff810b8630>] ? autoremove_wake_function+0x40/0x40
[Fri Nov 24 10:17:06 2017]  [<ffffffffa0103b4f>] ? do_get_write_access+0x24f/0x480 [jbd2]
[Fri Nov 24 10:17:06 2017]  [<ffffffff814db3f0>] ? ip_finish_output2+0x150/0x350
[Fri Nov 24 10:17:06 2017]  [<ffffffff81210737>] ? __find_get_block+0xa7/0x110
[Fri Nov 24 10:17:06 2017]  [<ffffffff81210ee6>] ? __getblk_gfp+0x26/0x50
[Fri Nov 24 10:17:06 2017]  [<ffffffffa01a81a3>] ? ext4_dirty_inode+0x43/0x60 [ext4]
[Fri Nov 24 10:17:06 2017]  [<ffffffffa0103dae>] ? jbd2_journal_get_write_access+0x2e/0x60 [jbd2]
[Fri Nov 24 10:17:06 2017]  [<ffffffffa01d4356>] ? __ext4_journal_get_write_access+0x36/0x70 [ext4]
[Fri Nov 24 10:17:06 2017]  [<ffffffffa01a484d>] ? ext4_reserve_inode_write+0x5d/0x80 [ext4]
[Fri Nov 24 10:17:06 2017]  [<ffffffffa01a48bf>] ? ext4_mark_inode_dirty+0x4f/0x210 [ext4]
[Fri Nov 24 10:17:06 2017]  [<ffffffffa01a81a3>] ? ext4_dirty_inode+0x43/0x60 [ext4]
[Fri Nov 24 10:17:06 2017]  [<ffffffff8120816a>] ? __mark_inode_dirty+0x17a/0x370
[Fri Nov 24 10:17:06 2017]  [<ffffffff811f5b69>] ? generic_update_time+0x79/0xd0
[Fri Nov 24 10:17:06 2017]  [<ffffffff811f53ad>] ? file_update_time+0xbd/0x110
[Fri Nov 24 10:17:06 2017]  [<ffffffff8116bd69>] ? __generic_file_write_iter+0x99/0x1d0
[Fri Nov 24 10:17:06 2017]  [<ffffffffa019b238>] ? ext4_file_write_iter+0x228/0x460 [ext4]
[Fri Nov 24 10:17:06 2017]  [<ffffffff811db6ae>] ? do_readv_writev+0x15e/0x2b0
[Fri Nov 24 10:17:06 2017]  [<ffffffff811dae0d>] ? __vfs_write+0xcd/0x120
[Fri Nov 24 10:17:06 2017]  [<ffffffff811db464>] ? vfs_write+0xa4/0x190
[Fri Nov 24 10:17:06 2017]  [<ffffffff811dc1e2>] ? SyS_write+0x52/0xc0
[Fri Nov 24 10:17:06 2017]  [<ffffffff815969b6>] ? system_call_fast_compare_end+0xc/0x6b
[Fri Nov 24 10:17:06 2017] INFO: task nrpe:4639 blocked for more than 120 seconds.
[Fri Nov 24 10:17:06 2017]       Not tainted 4.4.0-3-amd64 #1
[Fri Nov 24 10:17:06 2017] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Fri Nov 24 10:17:06 2017] nrpe            D ffff88203ec55d80     0  4639 160509 0x00000000
[Fri Nov 24 10:17:06 2017]  ffff881032c92f00 ffff8810385de0c0 ffff88108a83c000 ffff88108a83b948
[Fri Nov 24 10:17:06 2017]  7fffffffffffffff ffffffff81593490 ffff88108a83b9c8 ffff8810874535d8
[Fri Nov 24 10:17:06 2017]  ffffffff81592c11 0000000000000000 ffffffff81595ba5 7fffffffffffffff
[Fri Nov 24 10:17:06 2017] Call Trace:
[Fri Nov 24 10:17:06 2017]  [<ffffffff81593490>] ? bit_wait_timeout+0xa0/0xa0
[Fri Nov 24 10:17:06 2017]  [<ffffffff81592c11>] ? schedule+0x31/0x80
[Fri Nov 24 10:17:06 2017]  [<ffffffff81595ba5>] ? schedule_timeout+0x235/0x2d0
[Fri Nov 24 10:17:06 2017]  [<ffffffff8131b048>] ? __nla_reserve+0x38/0x50
[Fri Nov 24 10:17:06 2017]  [<ffffffff8131b09c>] ? __nla_put+0xc/0x20
[Fri Nov 24 10:17:06 2017]  [<ffffffff81545cbe>] ? inet6_fill_ifla6_attrs+0x3de/0x400
[Fri Nov 24 10:17:06 2017]  [<ffffffff810ac3da>] ? update_curr+0xba/0x130
[Fri Nov 24 10:17:06 2017]  [<ffffffff81593490>] ? bit_wait_timeout+0xa0/0xa0
[Fri Nov 24 10:17:06 2017]  [<ffffffff81592204>] ? io_schedule_timeout+0xb4/0x130
[Fri Nov 24 10:17:06 2017]  [<ffffffff810b82c5>] ? prepare_to_wait+0x55/0x80
[Fri Nov 24 10:17:06 2017]  [<ffffffff815934a7>] ? bit_wait_io+0x17/0x60
[Fri Nov 24 10:17:06 2017]  [<ffffffff81592f8a>] ? __wait_on_bit+0x5a/0x90
[Fri Nov 24 10:17:06 2017]  [<ffffffff81593490>] ? bit_wait_timeout+0xa0/0xa0
[Fri Nov 24 10:17:06 2017]  [<ffffffff8159303e>] ? out_of_line_wait_on_bit+0x7e/0xa0
[Fri Nov 24 10:17:06 2017]  [<ffffffff810b8630>] ? autoremove_wake_function+0x40/0x40
[Fri Nov 24 10:17:06 2017]  [<ffffffffa0103b4f>] ? do_get_write_access+0x24f/0x480 [jbd2]
[Fri Nov 24 10:17:06 2017]  [<ffffffff810dd3c4>] ? internal_add_timer+0x34/0x80
[Fri Nov 24 10:17:06 2017]  [<ffffffff81210737>] ? __find_get_block+0xa7/0x110
[Fri Nov 24 10:17:06 2017]  [<ffffffff81210ee6>] ? __getblk_gfp+0x26/0x50
[Fri Nov 24 10:17:06 2017]  [<ffffffffa01a81a3>] ? ext4_dirty_inode+0x43/0x60 [ext4]
[Fri Nov 24 10:17:06 2017]  [<ffffffffa0103dae>] ? jbd2_journal_get_write_access+0x2e/0x60 [jbd2]
[Fri Nov 24 10:17:06 2017]  [<ffffffffa01d4356>] ? __ext4_journal_get_write_access+0x36/0x70 [ext4]
[Fri Nov 24 10:17:06 2017]  [<ffffffffa01a484d>] ? ext4_reserve_inode_write+0x5d/0x80 [ext4]
[Fri Nov 24 10:17:06 2017]  [<ffffffffa01a48bf>] ? ext4_mark_inode_dirty+0x4f/0x210 [ext4]
[Fri Nov 24 10:17:06 2017]  [<ffffffffa01a81a3>] ? ext4_dirty_inode+0x43/0x60 [ext4]
[Fri Nov 24 10:17:06 2017]  [<ffffffff8120816a>] ? __mark_inode_dirty+0x17a/0x370
[Fri Nov 24 10:17:06 2017]  [<ffffffff811f5b69>] ? generic_update_time+0x79/0xd0
[Fri Nov 24 10:17:06 2017]  [<ffffffff811f53ad>] ? file_update_time+0xbd/0x110
[Fri Nov 24 10:17:06 2017]  [<ffffffff8116bd69>] ? __generic_file_write_iter+0x99/0x1d0
[Fri Nov 24 10:17:06 2017]  [<ffffffffa019b238>] ? ext4_file_write_iter+0x228/0x460 [ext4]
[Fri Nov 24 10:17:06 2017]  [<ffffffff81317500>] ? __percpu_counter_sum+0x60/0x70
[Fri Nov 24 10:17:06 2017]  [<ffffffffa01b4d24>] ? ext4_statfs+0x104/0x140 [ext4]
[Fri Nov 24 10:17:06 2017]  [<ffffffff811dae0d>] ? __vfs_write+0xcd/0x120
[Fri Nov 24 10:17:06 2017]  [<ffffffff811daeb3>] ? __kernel_write+0x53/0x100
[Fri Nov 24 10:17:06 2017]  [<ffffffff810fb1d2>] ? do_acct_process+0x462/0x4e0
[Fri Nov 24 10:17:06 2017]  [<ffffffff810fb8ac>] ? acct_process+0xdc/0x100
[Fri Nov 24 10:17:06 2017]  [<ffffffff8107c19e>] ? do_exit+0x79e/0xb10
[Fri Nov 24 10:17:06 2017]  [<ffffffff8107c589>] ? do_group_exit+0x39/0xb0
[Fri Nov 24 10:17:06 2017]  [<ffffffff8107c610>] ? SyS_exit_group+0x10/0x10
[Fri Nov 24 10:17:06 2017]  [<ffffffff815969b6>] ? system_call_fast_compare_end+0xc/0x6b
[Fri Nov 24 10:17:06 2017] INFO: task check_disk:4643 blocked for more than 120 seconds.
[Fri Nov 24 10:17:06 2017]       Not tainted 4.4.0-3-amd64 #1
[Fri Nov 24 10:17:06 2017] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Fri Nov 24 10:17:06 2017] check_disk      D ffff88203ec75d80     0  4643   4642 0x00000002
[Fri Nov 24 10:17:06 2017]  ffff8811f571c0c0 ffff8810385e6100 ffff88132982c000 ffff88132982be78
[Fri Nov 24 10:17:06 2017]  ffff881036d69164 ffff8811f571c0c0 00000000ffffffff ffff881036d69168
[Fri Nov 24 10:17:06 2017]  ffffffff81592c11 ffff881036d69160 ffffffff81592e9a ffffffff81594a44
[Fri Nov 24 10:17:06 2017] Call Trace:
[Fri Nov 24 10:17:06 2017]  [<ffffffff81592c11>] ? schedule+0x31/0x80
[Fri Nov 24 10:17:06 2017]  [<ffffffff81592e9a>] ? schedule_preempt_disabled+0xa/0x10
[Fri Nov 24 10:17:06 2017]  [<ffffffff81594a44>] ? __mutex_lock_slowpath+0xb4/0x120
[Fri Nov 24 10:17:06 2017]  [<ffffffff81594acb>] ? mutex_lock+0x1b/0x30
[Fri Nov 24 10:17:06 2017]  [<ffffffff810fb844>] ? acct_process+0x74/0x100
[Fri Nov 24 10:17:06 2017]  [<ffffffff8107c19e>] ? do_exit+0x79e/0xb10
[Fri Nov 24 10:17:06 2017]  [<ffffffff811db503>] ? vfs_write+0x143/0x190
[Fri Nov 24 10:17:06 2017]  [<ffffffff8107c589>] ? do_group_exit+0x39/0xb0
[Fri Nov 24 10:17:06 2017]  [<ffffffff8107c610>] ? SyS_exit_group+0x10/0x10
[Fri Nov 24 10:17:06 2017]  [<ffffffff815969b6>] ? system_call_fast_compare_end+0xc/0x6b
[Fri Nov 24 10:17:06 2017] INFO: task sshd:4645 blocked for more than 120 seconds.
[Fri Nov 24 10:17:06 2017]       Not tainted 4.4.0-3-amd64 #1
[Fri Nov 24 10:17:06 2017] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Fri Nov 24 10:17:06 2017] sshd            D ffff88203ead5d80     0  4645   4644 0x00000100
[Fri Nov 24 10:17:06 2017]  ffff8810372d0440 ffff881038558300 ffff8813a8890000 ffff8813a888fe78
[Fri Nov 24 10:17:06 2017]  ffff881036d69164 ffff8810372d0440 00000000ffffffff ffff881036d69168
[Fri Nov 24 10:17:06 2017]  ffffffff81592c11 ffff881036d69160 ffffffff81592e9a ffffffff81594a44
[Fri Nov 24 10:17:06 2017] Call Trace:
[Fri Nov 24 10:17:06 2017]  [<ffffffff81592c11>] ? schedule+0x31/0x80
[Fri Nov 24 10:17:06 2017]  [<ffffffff81592e9a>] ? schedule_preempt_disabled+0xa/0x10
[Fri Nov 24 10:17:06 2017]  [<ffffffff81594a44>] ? __mutex_lock_slowpath+0xb4/0x120
[Fri Nov 24 10:17:06 2017]  [<ffffffff81594acb>] ? mutex_lock+0x1b/0x30
[Fri Nov 24 10:17:06 2017]  [<ffffffff810fb844>] ? acct_process+0x74/0x100
[Fri Nov 24 10:17:06 2017]  [<ffffffff8107c19e>] ? do_exit+0x79e/0xb10
[Fri Nov 24 10:17:06 2017]  [<ffffffff8100388c>] ? syscall_trace_enter_phase1+0x11c/0x150
[Fri Nov 24 10:17:06 2017]  [<ffffffff8107c589>] ? do_group_exit+0x39/0xb0
[Fri Nov 24 10:17:06 2017]  [<ffffffff8107c610>] ? SyS_exit_group+0x10/0x10
[Fri Nov 24 10:17:06 2017]  [<ffffffff815969b6>] ? system_call_fast_compare_end+0xc/0x6b
[Fri Nov 24 10:17:06 2017] INFO: task cron:4646 blocked for more than 120 seconds.
[Fri Nov 24 10:17:06 2017]       Not tainted 4.4.0-3-amd64 #1
[Fri Nov 24 10:17:06 2017] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Fri Nov 24 10:17:06 2017] cron            D ffff88103f375d80     0  4646   1394 0x00000000
[Fri Nov 24 10:17:06 2017]  ffff882034540f40 ffff881038591180 ffff88124d7b8000 ffff88124d7b7a20
[Fri Nov 24 10:17:06 2017]  7fffffffffffffff ffffffff81593490 ffff88124d7b7aa0 ffff88103e81d298
[Fri Nov 24 10:17:06 2017]  ffffffff81592c11 0000000000000000 ffffffff81595ba5 7fffffffffffffff
[Fri Nov 24 10:17:06 2017] Call Trace:
[Fri Nov 24 10:17:06 2017]  [<ffffffff81593490>] ? bit_wait_timeout+0xa0/0xa0
[Fri Nov 24 10:17:06 2017]  [<ffffffff81592c11>] ? schedule+0x31/0x80
[Fri Nov 24 10:17:06 2017]  [<ffffffff81595ba5>] ? schedule_timeout+0x235/0x2d0
[Fri Nov 24 10:17:06 2017]  [<ffffffff810b1ece>] ? find_busiest_group+0x3e/0x4f0
[Fri Nov 24 10:17:06 2017]  [<ffffffff81593490>] ? bit_wait_timeout+0xa0/0xa0
[Fri Nov 24 10:17:06 2017]  [<ffffffff81592204>] ? io_schedule_timeout+0xb4/0x130
[Fri Nov 24 10:17:06 2017]  [<ffffffff810b82c5>] ? prepare_to_wait+0x55/0x80
[Fri Nov 24 10:17:06 2017]  [<ffffffff815934a7>] ? bit_wait_io+0x17/0x60
[Fri Nov 24 10:17:06 2017]  [<ffffffff81592f8a>] ? __wait_on_bit+0x5a/0x90
[Fri Nov 24 10:17:06 2017]  [<ffffffff81593490>] ? bit_wait_timeout+0xa0/0xa0
[Fri Nov 24 10:17:06 2017]  [<ffffffff8159303e>] ? out_of_line_wait_on_bit+0x7e/0xa0
[Fri Nov 24 10:17:06 2017]  [<ffffffff810b8630>] ? autoremove_wake_function+0x40/0x40
[Fri Nov 24 10:17:06 2017]  [<ffffffffa0103b4f>] ? do_get_write_access+0x24f/0x480 [jbd2]
[Fri Nov 24 10:17:06 2017]  [<ffffffffa0104299>] ? jbd2_journal_dirty_metadata+0x269/0x2c0 [jbd2]
[Fri Nov 24 10:17:06 2017]  [<ffffffffa0103dae>] ? jbd2_journal_get_write_access+0x2e/0x60 [jbd2]
[Fri Nov 24 10:17:06 2017]  [<ffffffffa01d4356>] ? __ext4_journal_get_write_access+0x36/0x70 [ext4]
[Fri Nov 24 10:17:06 2017]  [<ffffffffa019de58>] ? __ext4_new_inode+0xb78/0x1410 [ext4]
[Fri Nov 24 10:17:06 2017]  [<ffffffffa01af645>] ? ext4_create+0x115/0x1b0 [ext4]
[Fri Nov 24 10:17:06 2017]  [<ffffffff811e8007>] ? vfs_create+0xb7/0x120
[Fri Nov 24 10:17:06 2017]  [<ffffffff811ea2aa>] ? path_openat+0x140a/0x1520
[Fri Nov 24 10:17:06 2017]  [<ffffffff811e5035>] ? terminate_walk+0x55/0xb0
[Fri Nov 24 10:17:06 2017]  [<ffffffff8119808e>] ? do_set_pte+0x9e/0xd0
[Fri Nov 24 10:17:06 2017]  [<ffffffff811eb581>] ? do_filp_open+0x91/0x100
[Fri Nov 24 10:17:06 2017]  [<ffffffff811da5ba>] ? do_sys_open+0x13a/0x230
[Fri Nov 24 10:17:06 2017]  [<ffffffff815969b6>] ? system_call_fast_compare_end+0xc/0x6b
[Fri Nov 24 10:18:41 2017] megaraid_sas 0000:03:00.0: [ 0]waiting for 38 commands to complete for scsi0
[Fri Nov 24 10:18:46 2017] megaraid_sas 0000:03:00.0: [ 5]waiting for 38 commands to complete for scsi0
[Fri Nov 24 10:18:51 2017] megaraid_sas 0000:03:00.0: [10]waiting for 38 commands to complete for scsi0
[Fri Nov 24 10:18:56 2017] megaraid_sas 0000:03:00.0: [15]waiting for 38 commands to complete for scsi0
[Fri Nov 24 10:19:01 2017] megaraid_sas 0000:03:00.0: [20]waiting for 38 commands to complete for scsi0
[Fri Nov 24 10:19:06 2017] megaraid_sas 0000:03:00.0: [25]waiting for 38 commands to complete for scsi0
[Fri Nov 24 10:19:11 2017] megaraid_sas 0000:03:00.0: [30]waiting for 38 commands to complete for scsi0
[Fri Nov 24 10:19:16 2017] megaraid_sas 0000:03:00.0: [35]waiting for 38 commands to complete for scsi0
[Fri Nov 24 10:19:21 2017] megaraid_sas 0000:03:00.0: [40]waiting for 38 commands to complete for scsi0
[Fri Nov 24 10:19:26 2017] megaraid_sas 0000:03:00.0: [45]waiting for 38 commands to complete for scsi0
[Fri Nov 24 10:19:31 2017] megaraid_sas 0000:03:00.0: [50]waiting for 38 commands to complete for scsi0
[Fri Nov 24 10:19:36 2017] megaraid_sas 0000:03:00.0: [55]waiting for 38 commands to complete for scsi0
[Fri Nov 24 10:19:41 2017] megaraid_sas 0000:03:00.0: [60]waiting for 38 commands to complete for scsi0
[Fri Nov 24 10:19:46 2017] megaraid_sas 0000:03:00.0: [65]waiting for 38 commands to complete for scsi0
[Fri Nov 24 10:19:51 2017] megaraid_sas 0000:03:00.0: [70]waiting for 38 commands to complete for scsi0
[Fri Nov 24 10:19:56 2017] megaraid_sas 0000:03:00.0: [75]waiting for 38 commands to complete for scsi0
[Fri Nov 24 10:20:01 2017] megaraid_sas 0000:03:00.0: [80]waiting for 38 commands to complete for scsi0
[Fri Nov 24 10:20:06 2017] megaraid_sas 0000:03:00.0: [85]waiting for 38 commands to complete for scsi0
[Fri Nov 24 10:20:11 2017] megaraid_sas 0000:03:00.0: [90]waiting for 38 commands to complete for scsi0
[Fri Nov 24 10:20:16 2017] megaraid_sas 0000:03:00.0: [95]waiting for 38 commands to complete for scsi0
[Fri Nov 24 10:20:21 2017] megaraid_sas 0000:03:00.0: [100]waiting for 38 commands to complete for scsi0
[Fri Nov 24 10:20:26 2017] megaraid_sas 0000:03:00.0: [105]waiting for 38 commands to complete for scsi0
[Fri Nov 24 10:20:31 2017] megaraid_sas 0000:03:00.0: [110]waiting for 38 commands to complete for scsi0
[Fri Nov 24 10:20:36 2017] megaraid_sas 0000:03:00.0: [115]waiting for 38 commands to complete for scsi0
[Fri Nov 24 10:20:41 2017] megaraid_sas 0000:03:00.0: [120]waiting for 38 commands to complete for scsi0
[Fri Nov 24 10:20:46 2017] megaraid_sas 0000:03:00.0: [125]waiting for 38 commands to complete for scsi0
[Fri Nov 24 10:20:51 2017] megaraid_sas 0000:03:00.0: [130]waiting for 38 commands to complete for scsi0
[Fri Nov 24 10:20:56 2017] megaraid_sas 0000:03:00.0: [135]waiting for 38 commands to complete for scsi0
[Fri Nov 24 10:21:01 2017] megaraid_sas 0000:03:00.0: [140]waiting for 38 commands to complete for scsi0
[Fri Nov 24 10:21:06 2017] megaraid_sas 0000:03:00.0: [145]waiting for 38 commands to complete for scsi0
[Fri Nov 24 10:21:11 2017] megaraid_sas 0000:03:00.0: [150]waiting for 38 commands to complete for scsi0
[Fri Nov 24 10:21:16 2017] megaraid_sas 0000:03:00.0: [155]waiting for 38 commands to complete for scsi0
[Fri Nov 24 10:21:21 2017] megaraid_sas 0000:03:00.0: [160]waiting for 38 commands to complete for scsi0
[Fri Nov 24 10:21:26 2017] megaraid_sas 0000:03:00.0: [165]waiting for 38 commands to complete for scsi0
[Fri Nov 24 10:21:28 2017] megaraid_sas 0000:03:00.0: waitingfor controller reset to finish
[Fri Nov 24 10:21:31 2017] megaraid_sas 0000:03:00.0: [170]waiting for 38 commands to complete for scsi0
[Fri Nov 24 10:21:33 2017] megaraid_sas 0000:03:00.0: waitingfor controller reset to finish
[Fri Nov 24 10:21:36 2017] megaraid_sas 0000:03:00.0: [175]waiting for 38 commands to complete for scsi0
[Fri Nov 24 10:21:38 2017] megaraid_sas 0000:03:00.0: waitingfor controller reset to finish
[Fri Nov 24 10:21:41 2017] megaraid_sas 0000:03:00.0: pending commands remain after waiting, will reset adapter scsi0.
[Fri Nov 24 10:21:41 2017] megaraid_sas 0000:03:00.0: resetting fusion adapter scsi0.
[Fri Nov 24 10:21:43 2017] megaraid_sas 0000:03:00.0: waitingfor controller reset to finish
[Fri Nov 24 10:21:48 2017] megaraid_sas 0000:03:00.0: Waiting for FW to come to ready state
[Fri Nov 24 10:21:48 2017] megaraid_sas 0000:03:00.0: waitingfor controller reset to finish
[Fri Nov 24 10:21:53 2017] megaraid_sas 0000:03:00.0: waitingfor controller reset to finish
[Fri Nov 24 10:21:55 2017] megaraid_sas 0000:03:00.0: FW now in Ready state
[Fri Nov 24 10:21:56 2017] megaraid_sas 0000:03:00.0: Init cmd success
[Fri Nov 24 10:21:56 2017] megaraid_sas 0000:03:00.0: firmware type	: Extended VD(240 VD)firmware
[Fri Nov 24 10:21:56 2017] megaraid_sas 0000:03:00.0: controller type	: MR(1024MB)
[Fri Nov 24 10:21:56 2017] megaraid_sas 0000:03:00.0: Online Controller Reset(OCR)	: Enabled
[Fri Nov 24 10:21:56 2017] megaraid_sas 0000:03:00.0: Secure JBOD support	: No
[Fri Nov 24 10:21:56 2017] megaraid_sas 0000:03:00.0: Jbod map is not supported megasas_setup_jbod_map 4613
[Fri Nov 24 10:21:56 2017] megaraid_sas 0000:03:00.0: Reset successful for scsi0.
[Fri Nov 24 10:21:56 2017] megaraid_sas 0000:03:00.0: 2479 (2s/0x0020/CRIT) - Controller encountered a fatal error and was reset

Mentioned in SAL (#wikimedia-operations) [2017-11-24T11:48:04Z] <marostegui> Reboot es2018 after full-upgrade - T181293

Change 393211 merged by jenkins-bot:
[operations/mediawiki-config@master] maridb: depool es2018 after crash

https://gerrit.wikimedia.org/r/393211

Mentioned in SAL (#wikimedia-operations) [2017-11-24T11:50:39Z] <jynus@tin> Synchronized wmf-config/db-codfw.php: depool es2018 T181293 (duration: 00m 45s)

Mentioned in SAL (#wikimedia-operations) [2017-11-24T11:57:03Z] <marostegui> Disable puppet on es2018 - T181293

Change 393218 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] mariadb: Promote es2017 to master

https://gerrit.wikimedia.org/r/393218

Change 393218 merged by Marostegui:
[operations/puppet@production] mariadb: Promote es2017 to master

https://gerrit.wikimedia.org/r/393218

I would do a quick data check on enwiki around the time of the issue (compare.py) to see that no data has been lost, but other than that, this is fixed.

I would do a quick data check on enwiki around the time of the issue (compare.py) to see that no data has been lost, but other than that, this is fixed.

I am running that now

I have compared the last value from enwiki at: 171123 22:16:12 (155663487 )till the last one I just selected from the table (155702786).
And no differences were found.
Servers compared: es2018 with es2019 and with the current master, es2017. Also with eqiad server: es1019

Change 407407 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/mediawiki-config@master] mariadb: Repool es2018 after maintenance

https://gerrit.wikimedia.org/r/407407

Change 407407 merged by jenkins-bot:
[operations/mediawiki-config@master] mariadb: Repool es2018 after maintenance

https://gerrit.wikimedia.org/r/407407