Page MenuHomePhabricator

Have a regular cronjob which alerts about (potentially unadministrated) mailing lists with large (or aged?) moderation queues
Open, LowestPublicFeature

Description

As long as we still have Mailman mailing lists in Wikimedia, we'll see list admins/moderators/owners going absent without anyone else realizing. That leads to posts by non-subscribed folks never reaching the mailing list etc, or nobody moderating/blocking unwelcome behavior of a subscriber.

Random past examples like T270213: No admin response for many months for research-internal listserv or T232609: Reset inactive admin of offline-l mailing list or T270434#6699896 imply that it's a problem. We only get aware when someone both realizes and cares and knows where to escalate. We don't know how large the problem is.

It does not seem feasible to check if email addresses listed as admins don't bounce (and that would still not allow identifying valid email addresses' inboxes that their human owner does not follow up on).

I propose to

  • check the length of the moderation queue for each list,
  • alert if the queue passes a certain threshold,
  • in the alert, include the name of the list, the length of the queue, and its currently listed moderators (so one could try to reach them).

This is basically a request to automate T110609: publish statistics about number of held messages per list via some shell script cronjob to run maybe once a quarter.

Not sure where to best post its output though...

  • @ops-monitoring-bot posting its output into a Phab task, though I don't see which project tag this would fall under (Wikimedia-Mailing-lists should be more of a technical actionable thing instead of a social "let's try to find new admins" thing) and it would have to get split up into atomic and actionable subtasks?
  • spamming list-admins@ might feel off-topic?
  • @ops-monitoring-bot posting its output into a Phab Paste (but that's not discoverable at all)
  • personal email addresses won't scale and that's again a single point of failure

I'm not sure yet either how to proceed once statistics are available. Potentially contact the email addresses listed as list admins, check the archives of that list, if inactive create a task about archiving the list; if active contact the most active posters?
But first step is to get some data anyway.

Event Timeline

Aklapper triaged this task as Lowest priority.Dec 17 2020, 9:06 AM
Aklapper created this task.
Aklapper changed the subtype of this task from "Task" to "Feature Request".Dec 17 2020, 9:06 AM
Aklapper renamed this task from Have a regular cronjob which alerts about (potentially unadministrated) mailing lists with large moderation queues to Have a regular cronjob which alerts about (potentially unadministrated) mailing lists with large (or aged?) moderation queues.Dec 18 2020, 11:38 AM

I think it would be easiest to have a script that generates data for prometheus and make it visible in grafana.

Note that held messages are deleted after 90 days (T109838: clean up mailman data directory (moderated messages > 0.5 million)) which theoretically could be hiding problems.