Unexpected auditd service restart failure
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	ssingh
	Jul 23 2021, 4:41 PM

Description

When running puppet agent for the first time on the doh* hosts after https://gerrit.wikimedia.org/r/q/Id438985ffe720dc630f0e43eed8bda4a47c9196c, the auditd service failed to start with the following message:

Created symlink /etc/systemd/system/multi-user.target.wants/auditd.service -> /lib/systemd/system/auditd.service.
Job for auditd.service failed because a timeout was exceeded. See "systemctl status auditd.service" and "journalctl -xe" for details. invoke-rc.d: initscript auditd, action "start" failed. * auditd.service
- Security Auditing Service Loaded: loaded (/lib/systemd/system/auditd.service; enabled; vendor preset: enabled) Active: failed

The failure happened on some hosts but not all of them, within the same change. It seems like this is a bug in auditd: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=962451 where it timeouts when started after an install and then doing it again fixes it; this matches the behaviour we observed. A possible fix is available at https://github.com/linux-audit/audit-userspace/commit/ee6608eca034494fc2597b2990852adec236e486.

We should observe if this happens again after subsequent restarts and then consider backporting the patch to our auditd build.

Event Timeline

ssingh created this task.Jul 23 2021, 4:41 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 23 2021, 4:41 PM

Maintenance_bot added a project: SRE.Jul 23 2021, 4:45 PM

ssingh triaged this task as Low priority.Jul 23 2021, 5:07 PM

MoritzMuehlenhoff added a project: User-MoritzMuehlenhoff.Jul 26 2021, 7:00 AM

BBlack moved this task from Backlog to Ready for work on the Traffic board.Oct 8 2021, 7:29 PM

BCornwall subscribed.Feb 17 2023, 6:49 PM

AFAICT we aren't packaging auditd ourselves. It might be easiest to just notify a trigger to re-start the stupid service after install since it looks like Debian isn't going to fix it.

Per the bug that should be fixed in the auditd package in Bullseye, we'll be able to confirm when we reimage the doh* servers to Bullseye.

Ah, my bad, I thought this *was* affecting bullseye. Oops. Sounds good then.

We reimaged two hosts to bullseye and didn't notice any auditd failure, so confirming what @MoritzMuehlenhoff said above and marking this as resolved.

Unexpected auditd service restart failureClosed, ResolvedPublicActions

Description

Event Timeline

Unexpected auditd service restart failure
Closed, ResolvedPublic
Actions