Page MenuHomePhabricator

mailman: centralize logging or create a mailman admin group
Closed, ResolvedPublic

Description

Recent incident documentation:

https://wikitech.wikimedia.org/wiki/Incident_documentation/20150519-Mailman

We had an issue where having more than just roots/opsen access the mailman logs would have been beneficial. This task is to track the discussion/details of either implementing mailman pushing logs to central logging, or the setup of a mailman-admins group for review of logs and system details.

Event Timeline

RobH raised the priority of this task from to Medium.
RobH updated the task description. (Show Details)
RobH added a project: acl*sre-team.
RobH added subscribers: RobH, JohnLewis.

Some useful mailman logs do contain IPs and emails. Ops currently dislike releasing this on a 'as-needed-basis' because of these. Releasing this to logstash may not be something ops or users like which limits the ability to feed this into logstash as sacrificing useful logs for any access is not good. Notably for this investigation, bounces contain emails, which turned out to be useful for investigating yet could not be given as a result.

only slightly related to the general investigation, but looks like the checks on mailman queue should include all queues? (IOW should an alarm for 'too many messages in moderation' have fired?)

Hm, sounds like giving mailman-admins access to the box is the easiest solution. Otherwise, this sounds like a job for remote syslog to me.

Yeah, just getting local access may be easiest. But can we get some better understanding of the needs? What sort of log entries should be available and why are they helpful?

The needs of accessing the logs as a whole is mostly for supporting in debugging and assessing impact of changes (both of which have come up in the past relating to outages or glitches in simple actions).

Log entries that would be helpful would be basic apache and exim logs (for looking into apache and exim related issues for mailman, keeping this brief as I'm unsure of the depth of logging exim has and whether these would truly be helpful) and the mailman logs themselves. Since mailman is fairly verbose with logging and has separated logs for generic errors, bounces, qrunner and smtp(failures) - these would mostly be helpful. Using the recent outage as a use case, access to the error and bounce logs would have allowed earlier identification of a mass issue with requests bouncing (with the reasons) and the error log had the needed information to successful debug this. The outage was solved eventually once the bounce issue was identified and then the error log extracts were provided.

Those are the main log entries that should be available though an issue is the level of verbose means information like IPs and emails would be integrated into the log files themselves which is an disadvantage however it is looked at.

I wouldn't call it admin, but perhaps mailman-privatedata. -admin usually defines advanced root level rights for restarting services and such, where this is purely for log data access, correct?

It is purely for log viewing so the somewhat standard suffix of -user should be used imho.

Sounds like this is actually an access request.

we have that mailman-admins group meanwhile. it also includes being able to use "journalctl" to read logs

it is applied on fermium. we agreed it won't be applied on sodium

so i say this is resolved

Dzahn claimed this task.
mailman-admins:
  gid: 757
  description: Admins for mailman
  members: [johnflewis]
  privileges: ['ALL = (list) NOPASSWD: ALL',
               'ALL = NOPASSWD: /usr/sbin/service mailman *',
               'ALL = NOPASSWD: /bin/journalctl *']