Page MenuHomePhabricator

Fix syslog error "nslcd[29117]: error writing to client: Broken pipe"
Closed, InvalidPublic

Description

All instances I have access to (integration and cvn) have their syslog full of these errors:

integration-slave1002 nslcd[29117]: [......] <group=50380> error writing to client: Broken pipe
integration-slave1002 nslcd[29117]: [......] <group=1003> error writing to client: Broken pipe
integration-slave1002 nslcd[29117]: [......] <group=50062> error writing to client: Broken pipe
integration-slave1002 nslcd[29117]: [......] <group=550> error writing to client: Broken pipe

About every minute a couple come in.

Event Timeline

Krinkle raised the priority of this task from to Needs Triage.
Krinkle updated the task description. (Show Details)
Krinkle added projects: Cloud-Services, Cloud-VPS.
Krinkle changed Security from none to None.
Krinkle subscribed.

Groups are:

50380 - tools
1003 - wmf
50062 - bastion
550 - svn

So apparently these groups are too large for nslcd to deal with. This doesn't actually cause any problems - according to https://access.redhat.com/solutions/58684, nslcd just requests again with a larger buffer, and everything is ok.

Not sure how exactly to fix this - perhaps a syslog rule to ignore these?

yuvipanda triaged this task as Lowest priority.Dec 15 2014, 10:51 PM

Ah! I wondered for ages what that was. The syslog rule seems to be an appropriate workaround, but the upstream bugs about this (for Debian/Ubuntu cf. this somehow related bug) suggest that this should all have been resolved by now, yet on the Trusty instances it still occurs. So we should file a bug upstream so that it is fixed properly.

It seems on integration instances running Trusty this is frequenting the logs quite a lot. About a dozen syslog entries every minute.

Presumably groups have grown and as such more often reach the limit.

There's also a Debian bug report discussing this: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=685504

They propose creating a dummy user in /etc/passwd (or in our case, a group in /etc/group) to match the UID/GID, so that reverse lookups would not hit nslcd. This could avoid the (small) delays caused by this error.

bd808 subscribed.

Instances should be using SSSD for NSS now and not nslcd