Page MenuHomePhabricator

Allow wmcs cookbooks running on cloudcuminXXXX to write to the SAL
Closed, ResolvedPublic

Description

I'm not sure what the exact requirements are here, so let's clarify them in this task before implementing a solution.

Some things to note (about the current solution):

  • Currently we log to irc, from there salbot records the log
  • For that currently we use SALLogger from wmcs_libs.common, that in turn connects to wm-bot.wm-bot.wmcloud.org and sends the '!log <project>' message to #wikimedia-cloud-feed. This is the same process as the dologmsg script installed in the VMs:
dcaro@tools-sgebastion-10:~$ dologmsg --help
Usage: dologmsg MESSAGE...

Arguments are concatenated into a log message for the current tool
account and sent to #wikimedia-cloud. For example, when user johndoe
runs the command 'dologmsg webservice restart' from the tools.example
account, the following message will be sent to #wikimedia-cloud:

!log tools.example <johndoe> webservice restart

stashbot will then add a '<johndoe> webservice restart' log entry to
https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.example/SAL .

Event Timeline

To recap from an IRC chat, we need to define where should the automatic SAL log that spicerack emits on START/END of cookbooks should go (wmcs SAL, project SAL, both, others...), and how many sal-loggers should a cookbook be able to access to send custom messages from within the cookbook code (WMCS, project, others, all...).
Also what's the preferred way to send messages to those SALs (API, other).

Basically wm-bot serves the same relay bot role in the WMCS SAL logging path as logmsgbot does for wiki cluster SAL logging: apprelay botircstashbotwikitech + sal.toolforge.org

fnegri changed the task status from Open to In Progress.Jun 27 2023, 12:20 PM
fnegri claimed this task.
fnegri triaged this task as High priority.
fnegri moved this task from Backlog to In progress on the cloud-services-team (FY2022/2023-Q4) board.

Logging in wmcs-cookbooks is currently handled by the SALLogger class in wmcs_libs/common.py. It logs by sending messages to wm-bot.wm-bot.wmcloud.org on port 64835.

I reckon the only thing that we need to get SAL logging working from wmcs-cookbooks running in cloudcumins is to add an acl to the webproxy, to allow proxying packets from cloudcumins to wm-bot.wm-bot.wmcloud.org:64835.

Change 934309 had a related patch set uploaded (by FNegri; author: FNegri):

[operations/puppet@production] Allow cloudcumin hosts to connect to wm-bot

https://gerrit.wikimedia.org/r/934309

While I wait for a review on the patch above, I tested locally using a local copy of Squid with a similar configuration. I was able to connect successfully to wm-bot.wm-bot.wmcloud.org:64835 through my local Squid:

import socket
s = socket.socket()
s.connect(("127.0.0.1",3128))
s.send("CONNECT wm-bot.wm-bot.wmcloud.org:64835 HTTP/1.1\n\n".encode())
s.send("##test-dhinus test message\n".encode())

I'll need to adapt wmcs_libs/common.py to connect through the webproxy, but only when running in cloudcumin and not when running locally.

I think this is a reasonable approach, but please let me know if you can think of better/easier alternatives.

Change 934555 had a related patch set uploaded (by FNegri; author: FNegri):

[cloud/wmcs-cookbooks@main] Support connecting to wm-bot through proxy

https://gerrit.wikimedia.org/r/934555

The patch https://gerrit.wikimedia.org/r/934555 is a proof-of-concept that only works for the wmcs.do_log_msg but could be easily extended to all cookbooks.

I verified that it works with my local squid proxy:

$ cat squid.conf
cache_dir ufs /var/spool/squid 100 16 256
acl Safe_ports port 64835
acl cloudcumin_wmbot_port port 64835
acl cloudcumin_wmbot dst 185.15.56.81
http_access allow CONNECT cloudcumin_wmbot cloudcumin_wmbot_port
http_port 3128

$ cat /etc/spicerack/config.yaml |grep proxy
  http_proxy: http://localhost:3128

$ docker run -d --name squid -p 3128:3128 -v $(pwd)/squid.conf:/etc/squid/squid.conf ubuntu/squid

$ cookbook wmcs.do_log_msg --msg "test with proxy"

# The following message is successfully sent to IRC:
# !log admin test with proxy - cookbook ran by fran@wmf3169

If you want to do some testing, you can use the IRC channel ##test-T325756 where I already enabled wm-bot (you have to manually modify this line).

@fnegri thanks for the work on this! I think that as an interim workaround this is a good start to unblock the use of cookbooks in the cloudcumin hosts. Then when it's not a blocker anymore we can look how to better integrate this into spicerack itself adding support for multiple loggers, maybe based on the configuration file.

Change 934309 merged by FNegri:

[operations/puppet@production] Allow cloudcumin hosts to connect to wm-bot

https://gerrit.wikimedia.org/r/934309

Change 934555 merged by FNegri:

[cloud/wmcs-cookbooks@main] Support connecting to wm-bot through proxy

https://gerrit.wikimedia.org/r/934555

Mentioned in SAL (#wikimedia-cloud-feed) [2023-07-04T14:07:27Z] <wm-bot2> Test SAL log (T325756) - cookbook ran by root@cloudcumin1001

Mentioned in SAL (#wikimedia-cloud-feed) [2023-07-04T14:08:54Z] <wm-bot2> Test SAL log (T325756) - cookbook ran by root@cloudcumin1001

Change 935461 had a related patch set uploaded (by FNegri; author: FNegri):

[operations/puppet@production] cloudcumin: don't send logs prod IRC

https://gerrit.wikimedia.org/r/935461

As you can see from the messages above, logging is working correctly from cloudcumin1001!

But there's one remaining problem I didn't consider: it's also trying (and failing) to send "START" and "END" messages from Spicerack to the production SAL logs through icinga.wikimedia.org:

root@cloudcumin1001:~# cookbook wmcs.do_log_msg --msg "Test SAL log" --task-id T325756
--- Logging error ---
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/wmflib/irc.py", line 60, in _send_message
    sock.connect(self.addr)
socket.timeout: timed out
Call stack:
  File "/usr/bin/cookbook", line 33, in <module>
    sys.exit(load_entry_point('wikimedia-spicerack==7.2.1', 'console_scripts', 'cookbook')())
  File "/usr/lib/python3/dist-packages/spicerack/_cookbook.py", line 481, in main
    return cookbook_item.run()
  File "/usr/lib/python3/dist-packages/spicerack/_menu.py", line 208, in run
    _log.log_task_start(" ".join(("Cookbook", self.full_name, description)).strip())
  File "/usr/lib/python3/dist-packages/spicerack/_log.py", line 114, in log_task_start
    sal_logger.info("START - %s", message)
  File "/usr/lib/python3.9/logging/__init__.py", line 1442, in info
    self._log(INFO, msg, args, **kwargs)
  File "/usr/lib/python3.9/logging/__init__.py", line 1585, in _log
    self.handle(record)
  File "/usr/lib/python3.9/logging/__init__.py", line 1595, in handle
    self.callHandlers(record)
  File "/usr/lib/python3.9/logging/__init__.py", line 1657, in callHandlers
    hdlr.handle(record)
  File "/usr/lib/python3.9/logging/__init__.py", line 948, in handle
    self.emit(record)
  File "/usr/lib/python3/dist-packages/wmflib/irc.py", line 74, in emit
    self._send_message(message, record)
  File "/usr/lib/python3/dist-packages/wmflib/irc.py", line 63, in _send_message
    self.handleError(record)
Message: 'START - %s'
Arguments: ('Cookbook wmcs.do_log_msg',)
START - Cookbook wmcs.do_log_msg
[DOLOGMSG]: Test SAL log
--- Logging error ---
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/wmflib/irc.py", line 60, in _send_message
    sock.connect(self.addr)
socket.timeout: timed out
Call stack:
  File "/usr/bin/cookbook", line 33, in <module>
    sys.exit(load_entry_point('wikimedia-spicerack==7.2.1', 'console_scripts', 'cookbook')())
  File "/usr/lib/python3/dist-packages/spicerack/_cookbook.py", line 481, in main
    return cookbook_item.run()
  File "/usr/lib/python3/dist-packages/spicerack/_menu.py", line 252, in run
    _log.log_task_end(
  File "/usr/lib/python3/dist-packages/spicerack/_log.py", line 125, in log_task_end
    sal_logger.info("END (%s) - %s", status, message)
  File "/usr/lib/python3.9/logging/__init__.py", line 1442, in info
    self._log(INFO, msg, args, **kwargs)
  File "/usr/lib/python3.9/logging/__init__.py", line 1585, in _log
    self.handle(record)
  File "/usr/lib/python3.9/logging/__init__.py", line 1595, in handle
    self.callHandlers(record)
  File "/usr/lib/python3.9/logging/__init__.py", line 1657, in callHandlers
    hdlr.handle(record)
  File "/usr/lib/python3.9/logging/__init__.py", line 948, in handle
    self.emit(record)
  File "/usr/lib/python3/dist-packages/wmflib/irc.py", line 74, in emit
    self._send_message(message, record)
  File "/usr/lib/python3/dist-packages/wmflib/irc.py", line 63, in _send_message
    self.handleError(record)
Message: 'END (%s) - %s'
Arguments: ('PASS', 'Cookbook wmcs.do_log_msg (exit_code=0)')
END (PASS) - Cookbook wmcs.do_log_msg (exit_code=0)

The new patch https://gerrit.wikimedia.org/r/c/operations/puppet/+/935461 should fix this.

I discussed this with @Volans today and we agreed it would be nice to keep Spicerack's START and END messages, but make sure they are sent to the correct IRC channel (#wikimedia-operations for prod cookbooks, and #wikimedia-cloud-feed for wmcs cookbooks), and from there to the correct SAL logs (production or WMCS).

We could easily modify the current wmflib/irc.py adding an extra parameter to specify a different message format (e.g. !log project-name <message> instead of !log user@host <message> used for production SAL). The tcpircbot already supports listening on multiple ports and writing to different IRC channels, so one port might send messages to #wikimedia-operations and another to #wikimedia-cloud-feed.

Note: messages from Python to IRC would use the same bot running in icinga.wikimedia.org (that can be reached from cloudcumin without using a proxy), while messages from IRC to SAL archives would continue to use the existing bots (separate ones for prod and wmcs).

I think we would still need to modify Spicerack's _cookbook.py and _log,py to support a custom configuration being passed to the SocketHandler in wmflib/irc.py.

Once Spicerack is logging to the right channel, we might even use the Spicerack logger for all wmcs logs and get rid of the custom wmcs logger in wmcs_libs/common.py.

Quite a few details remain to be defined, but I tried to summarize here the state of the conversation.

I'm fine with making things more verbose for now, then we can trim out things that turn out to be superfluous once we have the start/stop messages.

Recap after the latest chat with @Volans:

  • log messages follow this path: cookbooks -> IRC -> SAL (https://sal.toolforge.org/ and https://wikitech.wikimedia.org/wiki/Server_Admin_Log)
  • cookbooks -> IRC happens in two ways:
    • SALLogger from wmcs_libs.common (only used by WMCS cookbooks)
      • SALLogger writes to IRC by sending a message to wm-bot, which is running on wm-bot.wm-bot.wmcloud.org
      • SALLogger is working correctly on cloudcumins, after proxy support was added in https://gerrit.wikimedia.org/r/c/934555
    • sal_logger from spicerack/_log.py (used by both production and wmcs cookbooks, because it's embedded in Spicerack)
      • sal_logger writes to IRC by sending a message to tcpircbot (aka logmsgbot), which is running on icinga.wikimedia.org
      • when we run WMCS cookbooks from a laptop, sal_logger is disabled
      • we would like sal_logger to be enabled when running WMCS cookbooks from cloudcumins
      • we would like sal_logger to include the WMCS project name in the log message, and that requires a change to Spicerack. I created a sub-task for this change: T341793.
  • IRC -> SAL is handled by https://gerrit.wikimedia.org/g/labs/tools/stashbot
    • it currently listens on multiple IRC channels (#wikimedia-operations, #wikimedia-cloud, #wikimedia-cloud-feed and others)
    • it knows how to parse messages in different formats and no change should be required to Stashbot

Mentioned in SAL (#wikimedia-cloud-feed) [2023-07-31T13:39:31Z] <wm-bot2> Test SAL log (T325756) - cookbook ran by root@cloudcumin1001

Mentioned in SAL (#wikimedia-cloud-feed) [2023-07-31T14:35:02Z] <wm-bot2> Test SAL log (T325756) - cookbook ran by root@cloudcumin1001

Mentioned in SAL (#wikimedia-cloud-feed) [2023-07-31T14:42:03Z] <wm-bot2> Test SAL log (T325756) - cookbook ran by root@cloudcumin1001

Mentioned in SAL (#wikimedia-cloud-feed) [2023-07-31T15:33:14Z] <wm-bot2> Test SAL log (T325756) - cookbook ran by root@cloudcumin1001

Mentioned in SAL (#wikimedia-cloud-feed) [2023-07-31T16:15:59Z] <wm-bot2> Test SAL log (T325756) - cookbook ran by root@cloudcumin1001

I don't think this is something we should implement right away, but I wonder if all this SAL logging should happen via user accounts rather than bots. That would mean that users would have to provide their IRC credentials to cumin/spicerack/cookbooks but it might bypass some concerns about who does or doesn't have access to the logbot.

Presumably all the hosts in question can already connect to IRC (since the bots are doing it) so I don't think this introduces any additional networking concerns.

From a conversation in a meet, in order to keep the ability to log messages when running from your laptop or a cloud VM (and allow non-roots to run certain cookbooks), we can try to reuse the same logger, but configure the destination host (to be wm-bot in our case), and prepend the channel using the logging formatter the same way we prepend the project.

That would allow us to consolidate the logging into the same code, and allow non-roots to run cookbooks.

To run from a laptop though, we will need to add socks-proxy capabilities, something like:

if self.proxy:
    my_socket.send(f"CONNECT {self.host}:{self.port} HTTP/1.1\n\n".encode())

https://gerrit.wikimedia.org/r/plugins/gitiles/cloud/wmcs-cookbooks/+/refs/heads/main/wmcs_libs/common.py#439

we can try to reuse the same logger, but configure the destination host (to be wm-bot in our case), and prepend the channel using the logging formatter the same way we prepend the project.

This unfortunately will not work, because wm-bot requires an input like #channel-name !log project-name message but using the logging formatter to add a #channel-name prefix, it would result in !log user@host #channel-name project-name message.

If we want to allow users to run cookbooks not just from cloudcuminXXXX, but also from a laptop or a CloudVPS host, I can think of two possible solutions:

  1. Modify pywmflib to support both bots (logmsgbot and wm-bot), sending the correct string depending on which bot it is connecting to.
  2. Allow users to connect to logmsgbot from CloudVPS hosts (is there a network path from CloudVPS to cloudcumin?), and from laptops (using a proxy?)

To run from a laptop though, we will need to add socks-proxy capabilities

I don't think we need a proxy to connect to wm-bot, I can connect from the public internet and send a message to IRC:

$ nc wm-bot.wm-bot.wmcloud.org 64835
#wikimedia-cloud-feed test message T325756

This sounds like an easy target for spammers, but that's another matter. :)

We would need a proxy for solution 2. above.

I'm marking this task as resolved, as the requirement was to have SAL logging working correctly from cloudcumin hosts, and that is now working fine.

I created two separate tasks for the separate requirements mentioned in the discussion above:

  • T343335 spicerack: sal_logger does not work when running from CloudVPS instances
  • T343336 spicerack: sal_logger does not work when running from a laptop

Change 935461 abandoned by FNegri:

[operations/puppet@production] cloudcumin: don't send logs to prod IRC

Reason:

Abandoned in favor of I1813b07f493c014b611df4da5fb398e0c76bd901

https://gerrit.wikimedia.org/r/935461