Page MenuHomePhabricator

mailman3 discard_held_messages systemd script apparently failing since 2023-03-26
Open, LowPublic

Description

See https://grafana.wikimedia.org/goto/026YpDUVz?orgId=1 & https://alerts.wikimedia.org/?q=instance%3Dlists1003%3A9100

I'm seeing alerts: mailman3.service Failed on lists1003:9100 or discard_held_messages.service Failed on lists1003:9100. If the Grafana graphic is to be trusted, this is failing since 2023-03-26.

discard_held_messages is T109838 / listserve.pp $ 70.

Event Timeline

Thanks for reporting. I don't think there is a any concern here since lists1003 isn't the production server yet.

It is a new machine that is being setup in T331706 and that ticket is still WIP.

cc: @jhathaway

16:25 <+jinxer-wm> (NodeTextfileStale) firing: Stale textfile for lists1003:9100 - https://wikitech.wikimedia.org/wiki/Prometheus#Stale_file_for_node-exporter_textfile - 
                   https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile - https://alerts.wikimedia.org/?q=alertname%3DNodeTextfileStale
16:26 <+logmsgbot> !log dzahn@cumin1001 START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on lists1003.wikimedia.org with reason: maintenance
16:26 <+logmsgbot> !log dzahn@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on lists1003.wikimedia.org with reason: maintenance
16:26 < mutante> a ticket was opened about that alert as well ^
16:26 < mutante> but I see that isnt yet the prod server
16:27 < mutante> so downtimed it over the weekend and left a comment on tickets
Dzahn triaged this task as Low priority.May 12 2023, 4:29 PM