Page MenuHomePhabricator

Segfault for systemd-sysusers.service on stat1007
Open, MediumPublic

Description

I keep seeing the following on stat1007:

Jun 23 07:14:08 stat1007 systemd-sysusers[23647]: Creating group systemd-coredump with gid 491.
Jun 23 07:14:08 stat1007 systemd-sysusers[23647]: Creating user systemd-coredump (systemd Core Dumper) with uid 491 and gid 491.
Jun 23 07:14:08 stat1007 systemd[1]: systemd-sysusers.service: Main process exited, code=dumped, status=11/SEGV
Jun 23 07:14:08 stat1007 systemd[1]: systemd-sysusers.service: Failed with result 'core-dump'.
Jun 23 07:14:08 stat1007 systemd[1]: Failed to start Create System Users.

The unit cannot be restarted, the segfault happens every time. I added LimitCORE=10G to the unit, and I got the following:

wget http://debug.mirrors.debian.org/debian-debug/pool/main/s/systemd/systemd-dbgsym_241-5~bpo9+1_amd64.deb
dpkg -i systemd-dbgsym_241-5~bpo9+1_amd64.deb
sudo gdb /bin/systemd-sysusers /var/tmp/core/core.stat1007.systemd-sysuser.23647.1592896448
[..]
(gdb) thread apply all bt

Thread 1 (Thread 0x7f8dad634900 (LWP 23647)):
#0  0x00007f8dad1b7dfe in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f8dad19bad4 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007f8dad180bfb in putsgent () from /lib/x86_64-linux-gnu/libc.so.6
#3  0x00007f8dacc94d7d in putsgent_sane (sg=<optimized out>, stream=<optimized out>) at ../src/basic/user-util.c:790
#4  0x0000560a259ee9bd in putsgent_with_members (sg=0x7f8dad42e760, gshadow=0x560a2749a520) at ../src/sysusers/sysusers.c:337
#5  0x0000560a259ef8a4 in write_temporary_gshadow (tmpfile_path=<synthetic pointer>, tmpfile=<synthetic pointer>, gshadow_path=<optimized out>)
    at ../src/sysusers/sysusers.c:695
#6  write_files.lto_priv.6 () at ../src/sysusers/sysusers.c:753
#7  0x0000560a259ee309 in run (argv=<optimized out>, argc=<optimized out>) at ../src/sysusers/sysusers.c:1990
#8  main (argc=<optimized out>, argv=<optimized out>) at ../src/sysusers/sysusers.c:1997

Seems similar to https://bugzilla.redhat.com/show_bug.cgi?id=1793577

The host is currently on Stretch, first time/location that I see this problem in.

Event Timeline

One note is:

elukey@stat1007:~$ apt-cache policy libsystemd0
libsystemd0:
  Installed: 241-5~bpo9+1
  Candidate: 241-5~bpo9+1
  Version table:
 *** 241-5~bpo9+1 1001
        100 http://mirrors.wikimedia.org/debian stretch-backports/main amd64 Packages
        100 /var/lib/dpkg/status
     232-25+deb9u12 500
        500 http://mirrors.wikimedia.org/debian stretch/main amd64 Packages
     232-25+deb9u11 500
        500 http://security.debian.org/debian-security stretch/updates/main amd64 Packages

We do use libsystemd0 from backports, set via puppet.

We could try to track this down whether a specific user definitions makes it crash, by first removing individual files from /usr/lib/sysusers.d (and if a file has been isolated, commenting individual definitions within the conf file)

The redhat bug report leads to https://github.com/systemd/systemd/issues/6512, I followed the steps outlined in there:

elukey@stat1007:~$ sudo gdb systemd-sysusers
[..]
(gdb) r
Starting program: /bin/systemd-sysusers
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Creating group systemd-coredump with gid 491.
Creating user systemd-coredump (systemd Core Dumper) with uid 491 and gid 491.

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7b5fdfe in ?? () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0  0x00007ffff7b5fdfe in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007ffff7b43ad4 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007ffff7b28bfb in putsgent () from /lib/x86_64-linux-gnu/libc.so.6
#3  0x00007ffff763cd7d in putsgent_sane (sg=<optimized out>, stream=<optimized out>) at ../src/basic/user-util.c:790
#4  0x00005555555579bd in putsgent_with_members (sg=0x7ffff7dd6760, gshadow=0x555555775660) at ../src/sysusers/sysusers.c:337
#5  0x00005555555588a4 in write_temporary_gshadow (tmpfile_path=<synthetic pointer>, tmpfile=<synthetic pointer>, gshadow_path=<optimized out>) at ../src/sysusers/sysusers.c:695
#6  write_files.lto_priv.6 () at ../src/sysusers/sysusers.c:753
#7  0x0000555555557309 in run (argv=<optimized out>, argc=<optimized out>) at ../src/sysusers/sysusers.c:1990
#8  main (argc=<optimized out>, argv=<optimized out>) at ../src/sysusers/sysusers.c:1997
(gdb) frame 4
#4  0x00005555555579bd in putsgent_with_members (sg=0x7ffff7dd6760, gshadow=0x555555775660) at ../src/sysusers/sysusers.c:337
337	../src/sysusers/sysusers.c: No such file or directory.
(gdb) print *sg
$1 = {sg_namp = 0x555555775ac0 "analytics-privatedata-users", sg_passwd = REDACTED, sg_adm = REDACTED, sg_mem = REDACTED}

elukey@stat1007:~$ sudo grep analytics-privatedata-users /etc/gshadow | wc -c
1175

Why only on stat1007 is no clear..

elukey@stat1006:~$ sudo systemd-sysusers
Creating group systemd-coredump with gid 490.
Creating user systemd-coredump (systemd Core Dumper) with uid 490 and gid 490.
Segmentation fault

But on stat1004 it works fine.

I checked /usr/lib/sysusers.d/*.conf and the last user listed is systemd-coredump, plus we still don't use systemd-sysusers in analytics (yet).

herron triaged this task as Medium priority.Jul 27 2020, 8:17 PM

Haven't seen the issue for a while, maybe it is worth closing since there is already an upstream bug opened for Debian Buster. Thoughts?

Change 697734 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/software/debmonitor@debian] Create debmonitor user on buster with adduser

https://gerrit.wikimedia.org/r/697734

Change 697734 merged by Muehlenhoff:

[operations/software/debmonitor@debian] Create debmonitor user on buster with adduser

https://gerrit.wikimedia.org/r/697734

I've just seen this on mw1384 while installing dragonfly-dfdaemon.