Page MenuHomePhabricator

Log errors / exceptions to Slack (or Telegram)
Closed, ResolvedPublic8 Estimated Story Points

Description

Currently when there are exceptions / errors on Translatewiki, we log them to IRC. It would be useful to have similar logging to Slack as well, as most of the team is present there. We still keep logging to IRC.

Following is the current IRC error notification structure:

  • We have a phplog2irc.service
  • This service uses rakkauspipe.sh script. This script runs logfilter.php script and pipes the output to a 127.0.0.1:8966
    • The logfilter.php file reads the error log file and outputs it
    • This output is then passed to 127.0.0.1:8996
  • On port 8996, we have an IRC relay service is running. See relay.pl file.

One approach to adding Slack notifications:

  • We create a new service and reuse the logfilter.php file
  • Create a PHP file that sends these error logs to Slack

Rate limits is a concern in the event that logs are being posted in a 'spammy' manner. However, the existing system posts messages every 30 seconds should they exist. Slack provides a rate limit of 1 message per second on their incoming webhooks API

Event Timeline

For Slack integration, we could use their web API to post external messages to a public or private channel.
Steps

  1. Slack admin will have to create an App with perhaps the name translatewiki-bot
  2. We'll need to select the 'Incoming Webhooks' feature and activate it for the App. From there, we should be able to post logs using a simple HTTP request to the webhook URL.
  3. The app will then request permission to access Wikimedia Foundation workspace and subsequently, ask to which channel the webhook should be associated with (in this case, it would be #translatewiki)

Under OAuth & Permissions, under scope: incoming-webhook: Post messages to specific channels in Slack seems to be the most appropriate for this use case. The webhook URL may be publicly visible within the code which may be a potential source of vandalism.

Rate limits is a concern in the event that logs are being posted in a 'spammy' manner. According to slack;

If you exceed a rate limit when using any of our HTTP-based APIs (including Incoming Webhooks), Slack will return a HTTP 429 Too Many Requests error, and a Retry-After HTTP header containing the number of seconds until you can retry.

We'll definitely have to use the HTTP header defensively to prevent hitting limits.

Another thing to consider is the how to display the message in a desirable manner in the Slack channel.

The message volume should be low (and if it isn't, the motivation to make it low should be very high), so we could use the existing translatewiki channel. We already have system in place that throttles messages to at most one message per 30 seconds.

Wangombe changed the task status from Open to In Progress.Feb 1 2023, 7:26 AM

Change 886032 had a related patch set uploaded (by Wangombe; author: Wangombe):

[translatewiki@master] Move irc logging files to new 'error-relay' folder

https://gerrit.wikimedia.org/r/886032

Change 886033 had a related patch set uploaded (by Wangombe; author: Wangombe):

[translatewiki@master] Add relay to log translatewiki errors to Slack

https://gerrit.wikimedia.org/r/886033

Wangombe changed the task status from In Progress to Stalled.Feb 20 2023, 12:30 PM

This ticket is currently awaiting security review. We shall resume after reviews are complete.

This ticket is currently awaiting security review. We shall resume after reviews are complete.

Hello @Wangombe, @Nikerabbit, @bcampbell et al -

I think we need to reset some expectations here. The Security-Team currently has no process or resources to review random, internal, ad-hoc tooling like this as it falls outside the scope of our current Application Security Review SOP and would also likely require a privacy review. At best, this would be a very low priority review for our team given current prioritization and resourcing. This request also sets a precedent that any user of Wikimedia's Slack instance should be allowed to build whatever custom integrations they'd like, which I don't believe has been discussed or agreed upon in any way by the ostensible maintainers of Wikimedia's Slack instance and other relevant teams such as the Security-Team and WMF-Legal. And then the subsequent development of certain formal policy and processes around reviewing and using such tooling. I understand that such considerations likely haven't been well-communicated across the Foundation and Community in the past, but given current resourcing and the horrific amount of Technical-Debt which already exists, the Security-Team absolutely needs to discuss this further with other stakeholders before we can proceed in any way.

Nikerabbit renamed this task from Log errors / exceptions to Slack to Log errors / exceptions to Slack (or Telegram).Mar 20 2023, 2:06 PM
Nikerabbit changed the task status from Stalled to Open.

We are exploring options of using an old style webhook or falling back to logging into our Telegram sysadmin channel.

Change 886032 merged by jenkins-bot:

[translatewiki@master] Move irc logging files to new 'relays' folder

https://gerrit.wikimedia.org/r/886032

abi_ set the point value for this task to 8.Apr 25 2023, 2:49 PM

Change 886033 merged by jenkins-bot:

[translatewiki@master] Add relay to log translatewiki errors to Slack

https://gerrit.wikimedia.org/r/886033

Change 914179 had a related patch set uploaded (by Wangombe; author: Wangombe):

[translatewiki@master] Fix thread blocking code in slack-logger.php

https://gerrit.wikimedia.org/r/914179

Change 914179 merged by jenkins-bot:

[translatewiki@master] Fix thread blocking code in slack-logger.php

https://gerrit.wikimedia.org/r/914179

The initial version of this feature has been deployed.

A few issues noticed:

1. Change log username

Change the user log name to: Translatewiki Logger

2. Wrap exceptions in codeblocks

Currently exceptions text are shown as is, but could be put in code blocks instead.

3. Exceptions logged are not useful

NOTE: This is the expected behavior

Given the following exception:

[2023-05-04T17:11:46.908425+00:00] exception.ERROR: [3f36063b97c41c3a379ff4e4] /w/api.php?action=translationaids&format=json&title=MediaWiki%3AMultimaps-marker-incorrect-icon-anchor%2Ffr&uselang=fr   MediaWiki\Revision\RevisionAccessException: Main slot of revision not found in database. See T212428. {"exception":"[object] (MediaWiki\\Revision\\RevisionAccessException(code: 0): Main slot of revision not found in database. See T212428. at /srv/mediawiki/tags/2023-05-04_14:31:50/includes/Revision/RevisionStore.php:1522)
[stacktrace]
#0 /srv/mediawiki/tags/2023-05-04_14:31:50/includes/Revision/RevisionStore.php(1381): MediaWiki\\Revision\\RevisionStore->constructSlotRecords()
#1 /srv/mediawiki/tags/2023-05-04_14:31:50/includes/Revision/RevisionStore.php(1556): MediaWiki\\Revision\\RevisionStore->loadSlotRecords()
....
","exception_url":"/w/api.php?action=translationaids&format=json&title=MediaWiki%3AMultimaps-marker-incorrect-icon-anchor%2Ffr&uselang=fr","reqId":"3f36063b97c41c3a379ff4e4","caught_by":"entrypoint"} []

Currently whats logged on Slack is:

","exception_url":"/w/api.php?action=translationaids&format=json&title=MediaWiki%3AMultimaps-marker-incorrect-icon-anchor%2Ffr&uselang=fr","reqId":"3f36063b97c41c3a379ff4e4","caught_by":"entrypoint"} []

What would be useful to log is:

[2023-05-04T17:11:46.908425+00:00] exception.ERROR: [3f36063b97c41c3a379ff4e4] /w/api.php?action=translationaids&format=json&title=MediaWiki%3AMultimaps-marker-incorrect-icon-anchor%2Ffr&uselang=fr   MediaWiki\Revision\RevisionAccessException: Main slot of revision not found in database. See T212428. {"exception":"[object] (MediaWiki\\Revision\\RevisionAccessException(code: 0): Main slot of revision not found in database. See T212428. at /srv/mediawiki/tags/2023-05-04_14:31:50/includes/Revision/RevisionStore.php:1522)

Change 917288 had a related patch set uploaded (by Wangombe; author: Wangombe):

[translatewiki@master] Format exception message using code block

https://gerrit.wikimedia.org/r/917288

Change 917288 merged by jenkins-bot:

[translatewiki@master] Slack: Use code block for exception message and change username

https://gerrit.wikimedia.org/r/917288

Resolving. Exceptions are now available on #translatewiki channel on Slack.