Page MenuHomePhabricator

Consider reducing verbosity of IRC logging
Open, LowPublicFeature

Description

Forked from T417163: Noise in #wikimedia-operations is making incident response more difficult

We could consider setting bots to use direct messages to reduce the amount of chatter in the channel without losing notification/highlight functionality outright. Off hand:

(... snip ...)

  • Detailed START/STATUS/DONE/PASS output. Along with a single "heads up" message to the channel, and follow-up messages only on failures
  • Downtime cookbook
  • Detailed START/STATUS/DONE/PASS output. Along with a single "heads up" message to the channel, and follow-up messages only on failures

I renamed a host today and it logged 24 lines

{P89851}

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

https://sal.toolforge.org/production?p=0&q=%22herron%40cumin1003%22&d=2026-03-12 is another view of the verbosity of the current cookbooks that @herron was using while repurposing mwlog2002 to o11ytest1001.

One thing that we could consider here if the current level of detail in SAL is desired would be finding another way to pass some/all of the log messages from the cookbooks to Striker and thus the SAL. That might look like splitting IRC messages across multiple channels where Stashbot idles. We could also consider extending Stashbot to provide an alternate transport outside of IRC for message submission.

Volans subscribed.

Removing the cumin tag as cumin doesn't log to IRC at all. Adding the SRE one as this is not a technical problem but a workflow one that involves everyone touching production (not only SREs).

Some related context:

  • Each cookbook owners can decide to log the START/END or alternatively a single DONE line using Spicerack API [1] depending if it's a long-lasting or short-lived cookbook.
  • Logstash logging (in case it could help) has been pending T213902 (for quite some time).
  • Removing those logs from IRC would require a significant workflow change for everyone touching production that right now gets live updates of those on IRC. And AFAIK both wikitech SAL and toolforge SAL don't have any live update mechanism right now. So bypassing IRC to store them directly on SAL is not a solution with the current tooling.

[1] https://doc.wikimedia.org/spicerack/master/api/spicerack.cookbook.html#spicerack.cookbook.CookbookRunnerBase.skip_start_sal

  • Each cookbook owners can decide to log the START/END or alternatively a single DONE line using Spicerack API [1] depending if it's a long-lasting or short-lived cookbook.
  • Removing those logs from IRC would require a significant workflow change for everyone touching production that right now gets live updates of those on IRC. And AFAIK both wikitech SAL and toolforge SAL don't have any live update mechanism right now. So bypassing IRC to store them directly on SAL is not a solution with the current tooling.

I'll propose we introduce a new low verbosity default. Always IRC log at start, and only emit additional info to IRC on error.

This would be at the highest level, one message to log execution of the top most cookbook to IRC. If a cookbook calls sub-steps, those are IRC silent unless they error. Info level is always logged to the terminal and logs.

That would provide live updates about recent changes, and live updates about problems, while keeping the majority of events to one line on IRC.

Logstash logging (in case it could help) has been pending T213902 (for quite some time).

T213902: Implement sensitive logstash access control would be sensitive (non-public) logs, but these are events emitted to public IRC channel and SAL

And AFAIK both wikitech SAL and toolforge SAL don't have any live update mechanism right now. So bypassing IRC to store them directly on SAL is not a solution with the current tooling.

Correct, Stashbot is the update mechanism for both and it's current user facing interface is IRC messages. This is exactly why I wrote We could also consider extending Stashbot to provide an alternate transport outside of IRC for message submission. in T419919#11704814.