Forked from T417163: Noise in #wikimedia-operations is making incident response more difficult
{P89851}
Forked from T417163: Noise in #wikimedia-operations is making incident response more difficult
{P89851}
https://sal.toolforge.org/production?p=0&q=%22herron%40cumin1003%22&d=2026-03-12 is another view of the verbosity of the current cookbooks that @herron was using while repurposing mwlog2002 to o11ytest1001.
One thing that we could consider here if the current level of detail in SAL is desired would be finding another way to pass some/all of the log messages from the cookbooks to Striker and thus the SAL. That might look like splitting IRC messages across multiple channels where Stashbot idles. We could also consider extending Stashbot to provide an alternate transport outside of IRC for message submission.
Removing the cumin tag as cumin doesn't log to IRC at all. Adding the SRE one as this is not a technical problem but a workflow one that involves everyone touching production (not only SREs).
Some related context:
I'll propose we introduce a new low verbosity default. Always IRC log at start, and only emit additional info to IRC on error.
This would be at the highest level, one message to log execution of the top most cookbook to IRC. If a cookbook calls sub-steps, those are IRC silent unless they error. Info level is always logged to the terminal and logs.
That would provide live updates about recent changes, and live updates about problems, while keeping the majority of events to one line on IRC.
Logstash logging (in case it could help) has been pending T213902 (for quite some time).
T213902: Implement sensitive logstash access control would be sensitive (non-public) logs, but these are events emitted to public IRC channel and SAL
Correct, Stashbot is the update mechanism for both and it's current user facing interface is IRC messages. This is exactly why I wrote We could also consider extending Stashbot to provide an alternate transport outside of IRC for message submission. in T419919#11704814.