Page MenuHomePhabricator

Puppet agent failure detected on instance integration-agent-pkgbuilder-1001 in project integration
Closed, ResolvedPublic

Description

Write the description below

From vps alertmanager:

summary: Puppet agent failure detected on instance integration-agent-pkgbuilder-1001 in project integration
2 days ago
instance: integration-agent-pkgbuilder-1001

Event Timeline

dcaro triaged this task as High priority.Aug 5 2021, 1:43 PM
dcaro created this task.
Bstorm added a subscriber: Bstorm.

The error is:

Notice: /Stage[main]/Base::Standard_packages/Debconf::Seen[wireshark-common/install-setuid]/Exec[set debconf flag seen for wireshark-common/install-setuid]/returns: debconf: DbDriver "config": /var/cache/debconf/config.dat is locked by another process: Resource temporarily unavailable
Error: 'echo fset wireshark-common/install-setuid seen true | debconf-communicate' returned 1 instead of one of [0]
Error: /Stage[main]/Base::Standard_packages/Debconf::Seen[wireshark-common/install-setuid]/Exec[set debconf flag seen for wireshark-common/install-setuid]/returns: change from 'notrun' to ['0'] failed: 'echo fset wireshark-common/install-setuid seen true | debconf-communicate' returned 1 instead of one of [0] (corrective)

This affects both VMs and is constantly emailing cloud-services-team

Bstorm added subscribers: hashar, dduvall, dancy.

Mentioned in SAL (#wikimedia-cloud) [2021-08-23T19:58:14Z] <bstorm> acked the alert for puppet on the integration pkgbuilder hosts using the new alertmanager thingy T288237

# lsof /var/cache/debconf/config.dat
COMMAND    PID USER   FD   TYPE DEVICE SIZE/OFF   NODE NAME
frontend 12880 root    4uW  REG  254,2    34334 134864 /var/cache/debconf/config.dat

# ps -up 12880
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root     12880  0.0  0.5  26048 20312 pts/0    S+   Aug02   0:00 /usr/bin/perl -w /usr/share/debconf/frontend /var/lib/dpkg/info/grub-pc.postinst configure 2.02+dfsg1-20+deb10u2

# strace -f -p !$
strace -f -p 12880
strace: Process 12880 attached
read(3,   C-c strace: Process 12880 detached
 <detached ...>


# lsof -p 12880 -a -d 3
COMMAND    PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
frontend 12880 root    3r   CHR    5,0      0t0 1035 /dev/tty


# pstree -p
systemd(1)─┬─agetty(570)
           ├─agetty(587)
           ├─apt(12828)───dpkg(12874)───frontend(12880)───grub-pc.postins(12892)
....


# ps -up 12828
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root     12828  0.0  1.6  78128 65784 ?        Ss   Aug02  26:54 apt upgrade -y

# ps -up 12892
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root     12892  0.0  0.0   7040  3548 pts/0    S+   Aug02   0:00 /bin/bash /var/lib/dpkg/info/grub-pc.postinst configure 2.02+dfsg1-20+deb10u2

# strace -f -p 12892
strace: Process 12892 attached
read(0,   C-c strace: Process 12892 detached
 <detached ...>

# lsof -p 12892 -a -d 0
COMMAND     PID USER   FD   TYPE DEVICE SIZE/OFF      NODE NAME
grub-pc.p 12892 root    0r  FIFO   0,12      0t0 225503728 pipe


I assume that /dev/tty is ultimately what is on the other side of that pipe.    I do see that all processes in the relevant process tree have /dev/pts/0 open.  I don't know what's on the other end of /dev/pts/0.

I'm going to kill off leaf processes one by one as needed.

I had to kill -9 12892 (grub-pc.postinst) and soft kill 12880 (frontend).

dancy claimed this task.

puppet agent runs to completion now.