At the moment I am unable to SSH onto tools-login.wmflabs.org:
$ ssh ireas@login.tools.wmflabs.org Permission denied (publickey,hostbased).
It seems that I am not the only user with that problem, see the #wikimedia-labs logs of today, 9:27.
At the moment I am unable to SSH onto tools-login.wmflabs.org:
$ ssh ireas@login.tools.wmflabs.org Permission denied (publickey,hostbased).
It seems that I am not the only user with that problem, see the #wikimedia-labs logs of today, 9:27.
The labs LDAP has some kind of troubles apparently.
[08:52:28] <icinga-wm> PROBLEM - Labs LDAP on seaborgium is CRITICAL: Could not bind to the LDAP server
I can't authenticate on Jenkins (which uses LDAP for authentication).
Nodepool can not access the OpenStack API either, requests yield error 500.
I have poked the internal operations list. Can't further babysit this task right now though :-(
Mentioned in SAL [2016-03-19T10:51:50Z] <hashar> Labs LDAP is probably down. T130446 Cant log to tools-login.wmflabs.org / Jenkins interface and Nodepool yields error 500 communicating with OpenStack API
looks like slapd got oom-killed, I've restarted it on seaborgium
Mar 19 08:48:29 seaborgium puppet-agent[8502]: Caching catalog for seaborgium.wikimedia.org Mar 19 08:48:30 seaborgium kernel: [3354892.550626] puppet invoked oom-killer: gfp_mask=0x280da, order=0, oom_score_adj=0 Mar 19 08:48:30 seaborgium kernel: [3354892.550631] puppet cpuset=/ mems_allowed=0 Mar 19 08:48:30 seaborgium kernel: [3354892.550641] CPU: 2 PID: 8502 Comm: puppet Not tainted 3.19.0-2-amd64 #1 Debian 3.19.3-9 Mar 19 08:48:30 seaborgium kernel: [3354892.550643] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014 Mar 19 08:48:30 seaborgium kernel: [3354892.550646] 0000000000000000 0000000000000000 ffffffff8154da6b 00000000000280da Mar 19 08:48:30 seaborgium kernel: [3354892.550649] ffffffff8154cd7a 0000000000000002 ffffffff815513be 00000000ffffffff Mar 19 08:48:30 seaborgium kernel: [3354892.550650] ffffffff8106d317 ffffffff818f62c0 ffffffff810c4d2c ffff8800bb779418 Mar 19 08:48:30 seaborgium kernel: [3354892.550653] Call Trace: Mar 19 08:48:30 seaborgium kernel: [3354892.550680] [<ffffffff8154da6b>] ? dump_stack+0x40/0x50 Mar 19 08:48:30 seaborgium kernel: [3354892.550694] [<ffffffff8154cd7a>] ? dump_header+0x95/0x1fd Mar 19 08:48:30 seaborgium kernel: [3354892.550700] [<ffffffff815513be>] ? mutex_lock+0xe/0x30 Mar 19 08:48:30 seaborgium kernel: [3354892.550714] [<ffffffff8106d317>] ? put_online_cpus+0x27/0xa0 Mar 19 08:48:30 seaborgium kernel: [3354892.550723] [<ffffffff810c4d2c>] ? rcu_oom_notify+0xcc/0xe0 Mar 19 08:48:30 seaborgium kernel: [3354892.550735] [<ffffffff81152f67>] ? oom_kill_process+0x247/0x390 Mar 19 08:48:30 seaborgium kernel: [3354892.550737] [<ffffffff81152adf>] ? find_lock_task_mm+0x3f/0xa0 Mar 19 08:48:30 seaborgium kernel: [3354892.550739] [<ffffffff81153492>] ? out_of_memory+0x232/0x510 Mar 19 08:48:30 seaborgium kernel: [3354892.550742] [<ffffffff81159071>] ? __alloc_pages_nodemask+0xac1/0xba0 Mar 19 08:48:30 seaborgium kernel: [3354892.550749] [<ffffffff8119e4f7>] ? alloc_pages_vma+0xa7/0x1c0 Mar 19 08:48:30 seaborgium kernel: [3354892.550751] [<ffffffff8115d420>] ? __put_single_page+0x20/0x20 Mar 19 08:48:30 seaborgium kernel: [3354892.550756] [<ffffffff8117ed19>] ? handle_mm_fault+0xdd9/0x1040 Mar 19 08:48:30 seaborgium kernel: [3354892.550761] [<ffffffff8105ca6b>] ? __do_page_fault+0x1ab/0x550 Mar 19 08:48:30 seaborgium kernel: [3354892.550764] [<ffffffff811867c8>] ? mprotect_fixup+0x138/0x210 Mar 19 08:48:30 seaborgium kernel: [3354892.550767] [<ffffffff81555658>] ? async_page_fault+0x28/0x30 Mar 19 08:48:30 seaborgium kernel: [3354892.550768] Mem-Info: Mar 19 08:48:30 seaborgium kernel: [3354892.550772] Node 0 DMA per-cpu: Mar 19 08:48:30 seaborgium kernel: [3354892.550774] CPU 0: hi: 0, btch: 1 usd: 0 Mar 19 08:48:30 seaborgium kernel: [3354892.550775] CPU 1: hi: 0, btch: 1 usd: 0 Mar 19 08:48:30 seaborgium kernel: [3354892.550776] CPU 2: hi: 0, btch: 1 usd: 0 Mar 19 08:48:30 seaborgium kernel: [3354892.550777] CPU 3: hi: 0, btch: 1 usd: 0 Mar 19 08:48:30 seaborgium kernel: [3354892.550778] Node 0 DMA32 per-cpu: Mar 19 08:48:30 seaborgium kernel: [3354892.550780] CPU 0: hi: 186, btch: 31 usd: 76 Mar 19 08:48:30 seaborgium kernel: [3354892.550781] CPU 1: hi: 186, btch: 31 usd: 0 Mar 19 08:48:30 seaborgium kernel: [3354892.550781] CPU 2: hi: 186, btch: 31 usd: 0 Mar 19 08:48:30 seaborgium kernel: [3354892.550782] CPU 3: hi: 186, btch: 31 usd: 0 Mar 19 08:48:30 seaborgium kernel: [3354892.550783] Node 0 Normal per-cpu: Mar 19 08:48:30 seaborgium kernel: [3354892.550784] CPU 0: hi: 186, btch: 31 usd: 51 Mar 19 08:48:30 seaborgium kernel: [3354892.550785] CPU 1: hi: 186, btch: 31 usd: 0 Mar 19 08:48:30 seaborgium kernel: [3354892.550786] CPU 2: hi: 186, btch: 31 usd: 0 Mar 19 08:48:30 seaborgium kernel: [3354892.550787] CPU 3: hi: 186, btch: 31 usd: 0 Mar 19 08:48:30 seaborgium kernel: [3354892.550791] active_anon:707128 inactive_anon:264465 isolated_anon:0 Mar 19 08:48:30 seaborgium kernel: [3354892.550791] active_file:0 inactive_file:98 isolated_file:0 Mar 19 08:48:30 seaborgium kernel: [3354892.550791] unevictable:1517 dirty:14 writeback:0 unstable:0 Mar 19 08:48:30 seaborgium kernel: [3354892.550791] free:22396 slab_reclaimable:3738 slab_unreclaimable:5668 Mar 19 08:48:30 seaborgium kernel: [3354892.550791] mapped:2267 shmem:19787 pagetables:3212 bounce:0 Mar 19 08:48:30 seaborgium kernel: [3354892.550791] free_cma:0 Mar 19 08:48:30 seaborgium kernel: [3354892.550794] Node 0 DMA free:15872kB min:264kB low:328kB high:396kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:4kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes Mar 19 08:48:30 seaborgium kernel: [3354892.550798] lowmem_reserve[]: 0 2980 3939 3939 Mar 19 08:48:30 seaborgium kernel: [3354892.550800] Node 0 DMA32 free:56236kB min:50932kB low:63664kB high:76396kB active_anon:2361856kB inactive_anon:590952kB active_file:0kB inactive_file:96kB unevictable:3980kB isolated(anon):0kB isolated(file):0kB present:3129212kB managed:3054388kB mlocked:3980kB dirty:144kB writeback:0kB mapped:3304kB shmem:52804kB slab_reclaimable:8380kB slab_unreclaimable:14952kB kernel_stack:1104kB pagetables:9616kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:572 all_unreclaimable? no Mar 19 08:48:30 seaborgium kernel: [3354892.550804] lowmem_reserve[]: 0 0 958 958 Mar 19 08:48:30 seaborgium kernel: [3354892.550806] Node 0 Normal free:17352kB min:16380kB low:20472kB high:24568kB active_anon:466656kB inactive_anon:466908kB active_file:92kB inactive_file:396kB unevictable:2088kB isolated(anon):0kB isolated(file):0kB present:1048576kB managed:981752kB mlocked:2088kB dirty:0kB writeback:0kB mapped:5764kB shmem:26344kB slab_reclaimable:6568kB slab_unreclaimable:7720kB kernel_stack:896kB pagetables:3232kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:472 all_unreclaimable? no Mar 19 08:48:30 seaborgium kernel: [3354892.550809] lowmem_reserve[]: 0 0 0 0 Mar 19 08:48:30 seaborgium kernel: [3354892.550811] Node 0 DMA: 2*4kB (UE) 1*8kB (E) 1*16kB (E) 3*32kB (UE) 4*64kB (UE) 1*128kB (E) 2*256kB (UE) 1*512kB (E) 2*1024kB (UE) 2*2048kB (ER) 2*4096kB (M) = 15872kB Mar 19 08:48:30 seaborgium kernel: [3354892.550820] Node 0 DMA32: 984*4kB (UEM) 725*8kB (UEM) 485*16kB (UEM) 267*32kB (UEM) 158*64kB (UEM) 69*128kB (UEM) 21*256kB (UEM) 4*512kB (EM) 0*1024kB 0*2048kB 1*4096kB (R) = 56504kB Mar 19 08:48:30 seaborgium kernel: [3354892.550828] Node 0 Normal: 369*4kB (UEM) 262*8kB (UEM) 139*16kB (UEM) 56*32kB (UEM) 32*64kB (UEM) 11*128kB (UEM) 5*256kB (UM) 2*512kB (U) 0*1024kB 0*2048kB 1*4096kB (R) = 17444kB Mar 19 08:48:30 seaborgium kernel: [3354892.550849] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB Mar 19 08:48:30 seaborgium kernel: [3354892.550850] 20969 total pagecache pages Mar 19 08:48:30 seaborgium kernel: [3354892.550852] 128 pages in swap cache Mar 19 08:48:30 seaborgium kernel: [3354892.550855] Swap cache stats: add 268953, delete 268825, find 271298/277626 Mar 19 08:48:30 seaborgium kernel: [3354892.550856] Free swap = 0kB Mar 19 08:48:30 seaborgium kernel: [3354892.550857] Total swap = 998396kB Mar 19 08:48:30 seaborgium kernel: [3354892.550858] 1048445 pages RAM Mar 19 08:48:30 seaborgium kernel: [3354892.550858] 0 pages HighMem/MovableOnly Mar 19 08:48:30 seaborgium kernel: [3354892.550859] 35433 pages reserved Mar 19 08:48:30 seaborgium kernel: [3354892.550860] 0 pages hwpoisoned Mar 19 08:48:30 seaborgium kernel: [3354892.550861] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Mar 19 08:48:30 seaborgium kernel: [3354892.550864] [ 171] 0 171 8240 1328 22 43 0 systemd-journal Mar 19 08:48:30 seaborgium kernel: [3354892.550866] [ 173] 0 173 10259 2 21 192 -1000 systemd-udevd Mar 19 08:48:30 seaborgium kernel: [3354892.550868] [ 478] 0 478 4753 6 14 38 0 atd Mar 19 08:48:30 seaborgium kernel: [3354892.550870] [ 479] 0 479 6873 38 19 32 0 cron Mar 19 08:48:30 seaborgium kernel: [3354892.550871] [ 485] 0 485 4962 13 14 54 0 systemd-logind Mar 19 08:48:30 seaborgium kernel: [3354892.550873] [ 557] 0 557 1062 3 8 35 0 acpid Mar 19 08:48:30 seaborgium kernel: [3354892.550875] [ 562] 0 562 3602 3 12 36 0 agetty Mar 19 08:48:30 seaborgium kernel: [3354892.550876] [ 563] 0 563 3557 3 12 37 0 agetty Mar 19 08:48:30 seaborgium kernel: [3354892.550878] [ 567] 0 567 16736 10 30 247 0 bacula-fd Mar 19 08:48:30 seaborgium kernel: [3354892.550880] [14062] 106 14062 10531 114 24 69 -900 dbus-daemon Mar 19 08:48:30 seaborgium kernel: [3354892.550882] [14069] 111 14069 80786 3830 61 2439 0 diamond Mar 19 08:48:30 seaborgium kernel: [3354892.550883] [14085] 0 14085 9270 106 24 91 0 rpcbind Mar 19 08:48:30 seaborgium kernel: [3354892.550885] [14094] 0 14094 13969 413 29 122 0 lldpd Mar 19 08:48:30 seaborgium kernel: [3354892.550886] [14097] 108 14097 13969 27 26 122 0 lldpd Mar 19 08:48:30 seaborgium kernel: [3354892.550888] [14109] 999 14109 17657 336 39 701 0 gmond Mar 19 08:48:30 seaborgium kernel: [3354892.550890] [14129] 107 14129 9320 352 24 147 0 rpc.statd Mar 19 08:48:30 seaborgium kernel: [3354892.550891] [14141] 0 14141 5839 0 16 53 0 rpc.idmapd Mar 19 08:48:30 seaborgium kernel: [3354892.550893] [14189] 105 14189 13312 265 28 144 0 exim4 Mar 19 08:48:30 seaborgium kernel: [3354892.550894] [14214] 110 14214 8447 446 21 107 0 ntpd Mar 19 08:48:30 seaborgium kernel: [3354892.550896] [14219] 0 14219 65721 297 34 74 0 rsyslogd Mar 19 08:48:30 seaborgium kernel: [3354892.550898] [14229] 0 14229 13896 438 31 131 -1000 sshd Mar 19 08:48:30 seaborgium kernel: [3354892.550899] [12859] 113 12859 1458446 892462 2369 244149 0 slapd Mar 19 08:48:30 seaborgium kernel: [3354892.550901] [21388] 112 21388 5946 468 16 0 0 nrpe Mar 19 08:48:30 seaborgium kernel: [3354892.550903] [ 3241] 0 3241 129824 9400 114 0 0 salt-minion Mar 19 08:48:30 seaborgium kernel: [3354892.550904] [11524] 0 11524 4588 1519 14 0 0 atop Mar 19 08:48:30 seaborgium kernel: [3354892.550906] [ 8447] 0 8447 10556 81 25 10 0 cron Mar 19 08:48:30 seaborgium kernel: [3354892.550908] [ 8448] 0 8448 1084 176 7 0 0 sh Mar 19 08:48:30 seaborgium kernel: [3354892.550909] [ 8449] 0 8449 3309 424 10 0 0 puppet-run Mar 19 08:48:30 seaborgium kernel: [3354892.550911] [ 8501] 0 8501 2519 95 10 0 0 timeout Mar 19 08:48:30 seaborgium kernel: [3354892.550912] [ 8502] 0 8502 90908 45882 159 0 0 puppet Mar 19 08:48:30 seaborgium kernel: [3354892.550914] Out of memory: Kill process 12859 (slapd) score 902 or sacrifice child Mar 19 08:48:30 seaborgium kernel: [3354892.732696] Killed process 12859 (slapd) total-vm:5833784kB, anon-rss:3569848kB, file-rss:0kB
judging from ganglia, there's memory leakage which eventually finished the swap
serpens is still running fine. All labs instances use both serpens and seaborgium in their LDAP client config. tools-login and nodetool should also be converted to use a second failover LDAP server in their configurations.
Mentioned in SAL [2016-03-19T12:34:50Z] <godog> service supervisor stop, causing high traffic from ldap server T130446
possibly related, nslcd on zulip-01 was causing ~3MB/s of outgoing traffic on serpens and now seaborgium after a service nslcd restart. Likely due to fast-respawning zulip processes managed by supervisord, I've stopped supervisord for now and high traffic to seaborgium has stopped
Working for me too, thanks! Do you want to leave this task open to investigate the cause of the problem, or should I close it?
Mentioned in SAL [2016-03-19T13:04:29Z] <hashar> Jenkins: added ldap-labs-codfw.wikimedia.org as a fallback LDAP server T130446
All back for me as well. Thanks @fgiunchedi and @MoritzMuehlenhoff
For the record, Jenkins solely relied on ldap-labs.eqiad.wikimedia.org, (seaborgium), I have added the other as a fallback: ldap-labs-codfw.wikimedia.org.
Nodepool managed to boot an instance on 2016-03-19 08:15:49 UTC, and failed at 09:14UTC. Seems to indicate the OpenStack configuration is lacking a fallback to codfw LDAP server.