Page MenuHomePhabricator

labstore1006: Persistent high iowait - 2022-09-03
Closed, ResolvedPublic

Description

Currently triggering paging alert.

  • The runbook is not very useful

Ssh'd to the machine, and the processes that are using more io are:

root@labstore1006:~# iotop --iter 1 --processes --accumulated --batch --only
Total DISK READ:       289.33 M/s | Total DISK WRITE:         0.00 B/s
Current DISK READ:     296.57 M/s | Current DISK WRITE:       0.00 B/s
  PID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN      IO    COMMAND
 3693 be/4 www-data   1408.00 K      0.00 B  0.00 % 99.99 % nginx: worker process
 3685 be/4 www-data      2.25 M      0.00 B  0.00 % 99.99 % nginx: worker process
 3696 be/4 www-data   1024.00 K      0.00 B  0.00 % 99.99 % nginx: worker process
 3697 be/4 www-data   1792.00 K      0.00 B  0.00 % 99.99 % nginx: worker process
 3692 be/4 www-data    128.00 K      0.00 B  0.00 % 42.85 % nginx: worker process
 3689 be/4 www-data    384.00 K      0.00 B  0.00 % 41.10 % nginx: worker process
 3694 be/4 www-data   1152.00 K      0.00 B  0.00 % 39.12 % nginx: worker process
 3698 be/4 www-data    640.00 K      0.00 B  0.00 % 38.98 % nginx: worker process
24360 be/4 dumpsgen    256.00 K      0.00 B  0.00 %  0.16 % rsync --daemon --no-detach
 3684 be/4 www-data    256.00 K      0.00 B  0.00 %  0.00 % nginx: worker process
 3691 be/4 www-data    896.00 K      0.00 B  0.00 %  0.00 % nginx: worker process

So mainly nginx here and read operations.
Checked dmesg and journalctl but nothing popped up.

On nginx logs it seems that there's one specific IP that is doing many requests, and even getting rate limited:

root@labstore1006:~# tail -n 10000 /var/log/nginx/access.log | awk '{print $1}' | sort | uniq -c | sort -n | tail
     48 someip1
     50 someip2
     67 someip3
     68 someip4
     80 someip5
     98 someip6
    122 someip7
    370 someip8
    646 someip9
   7531 someip10
root@labstore1006:~# tail /var/log/nginx/error.log
2022/09/03 08:43:22 [error] 3685#3685: *643443 limiting connections by zone "addr", client: someip10, server: dumps.wikimedia.org, request: "HEAD /dewiki/20220701/dewiki-20220701-pages-articles-multistream6.xml-p10761245p12261244.bz2 HTTP/1.1", host: "dumps.wikimedia.org"
2022/09/03 08:43:22 [error] 3684#3684: *643447 limiting connections by zone "addr", client: someip10, server: dumps.wikimedia.org, request: "HEAD /dewiki/20220701/dewiki-20220701-pagelinks.sql.gz HTTP/1.1", host: "dumps.wikimedia.org"
2022/09/03 08:43:22 [error] 3684#3684: *643494 limiting connections by zone "addr", client: someip10, server: dumps.wikimedia.org, request: "HEAD /dewiki/20220701/dewiki-20220701-pages-articles-multistream5.xml-p9115465p9261244.bz2 HTTP/1.1", host: "dumps.wikimedia.org"
2022/09/03 08:43:22 [error] 3684#3684: *643442 limiting connections by zone "addr", client: someip10, server: dumps.wikimedia.org, request: "HEAD /dewiki/20220701/dewiki-20220701-pages-articles-multistream-index6.txt-p12261245p12293166.bz2 HTTP/1.1", host: "dumps.wikimedia.org"
2022/09/03 08:43:22 [error] 3684#3684: *643447 limiting connections by zone "addr", client: someip10, server: dumps.wikimedia.org, request: "HEAD /dewiki/20220701/dewiki-20220701-pagelinks.sql.gz HTTP/1.1", host: "dumps.wikimedia.org"
2022/09/03 08:43:22 [error] 3684#3684: *643494 limiting connections by zone "addr", client: someip10, server: dumps.wikimedia.org, request: "HEAD /dewiki/20220701/dewiki-20220701-pages-articles-multistream5.xml-p9115465p9261244.bz2 HTTP/1.1", host: "dumps.wikimedia.org"
2022/09/03 08:43:22 [error] 3686#3686: *643485 limiting connections by zone "addr", client: someip10, server: dumps.wikimedia.org, request: "HEAD /dewiki/20220701/dewiki-20220701-pages-articles-multistream1.xml-p1p297012.bz2 HTTP/1.1", host: "dumps.wikimedia.org"
2022/09/03 08:43:22 [error] 3684#3684: *643442 limiting connections by zone "addr", client: someip10, server: dumps.wikimedia.org, request: "HEAD /dewiki/20220701/dewiki-20220701-pages-articles-multistream-index6.txt-p12261245p12293166.bz2 HTTP/1.1", host: "dumps.wikimedia.org"
2022/09/03 08:43:22 [error] 3684#3684: *643447 limiting connections by zone "addr", client: someip10, server: dumps.wikimedia.org, request: "HEAD /dewiki/20220701/dewiki-20220701-pagelinks.sql.gz HTTP/1.1", host: "dumps.wikimedia.org"
2022/09/03 08:43:22 [error] 3684#3684: *643494 limiting connections by zone "addr", client: someip10, server: dumps.wikimedia.org, request: "HEAD /dewiki/20220701/dewiki-20220701-pages-articles-multistream5.xml-p9115465p9261244.bz2 HTTP/1.1", host: "dumps.wikimedia.org"

Looking

Event Timeline

dcaro triaged this task as High priority.Sep 3 2022, 8:44 AM
dcaro created this task.
dcaro changed the task status from Open to In Progress.Sep 3 2022, 8:48 AM
dcaro moved this task from To refine to Doing on the User-dcaro board.

It's going down already, started at ~8:00UTC until ~9:00UTC

labstore1004-1005-1006-1007.png (500×1 px, 68 KB)

This might be related to T317001 and would be solved/relieved by T306550