Page MenuHomePhabricator

Split long lines into multiple messages when they exceed the line-limit
Closed, ResolvedPublicFeature

Description

Bridgebot's messages as transferred from Telegram to IRC are sometimes cut-off due to exceeding (some sort?) of length-limit.
It would be good if these messages could instead be split into multiple messages, so that participants can read the entire message without confusion (or needing to open/login/register at another platform).

e.g. A recent message was shown in IRC as

Thank you for the interest. Neither BWW, nor BFO or any of the others foundational ontologies or philosophical principles are used to guide the quality of the representation of the data and knowledge, at present. I'm not aware of a documented design rationale for why not; whoever decided on that may want to chime in here. With respect to the template language, this is orthogonal to it, in that the natural language gene

whereas the full message in Telegram was:

Thank you for the interest. Neither BWW, nor BFO or any of the others foundational ontologies or philosophical principles are used to guide the quality of the representation of the data and knowledge, at present. I’m not aware of a documented design rationale for why not; whoever decided on that may want to chime in here. With respect to the template language, this is orthogonal to it, in that the natural language generation aspect assumes there’s suitably structured input and rather concerns itself with taking it from there to generate natural language sentences from that input. It considers the output specification and so if the input is messy, it would require more (pre)processing to get acceptable output (e.g., checking for which modelling pattern was used, which naming scheme etc, which reside in the functions).

Notes:

  • The 2nd post here is another example, and penultimate post here is another.
  • Those 3 examples all seem to get cut off after 456 characters (with spaces) if I include the timestamps and usernames (i.e. [10:31:06] <wm-bb> <marijke00>).
  • IRC protocol has a hard 512 character limit.

Event Timeline

Per the upstream docs, IRC messages should be truncated at 400 chars by default and have a "<message clipped>" indicator appended when that happens. The linked log shows the indicator not being applied and 430 characters of message payload, so the docs and the implementation at least do not match.

While digging in the upstream code I noticed https://github.com/42wim/matterbridge/issues/1540 has already been filed about how the docs and the implementation don't actually match. It turns out that for "MessageLength" to make any difference on an IRC bridge the "MessageSplit" setting needs to also be enabled.

Mentioned in SAL (#wikimedia-cloud) [2022-08-23T17:51:07Z] <wm-bot> <bd808> Added explicit IRC message splitting configuration (T315951)

bd808 claimed this task.
diff --git a/etc/matterbridge.toml b/etc/matterbridge.toml
index 79d899a..b5c930a 100644
--- a/etc/matterbridge.toml
+++ b/etc/matterbridge.toml
@@ -32,6 +32,10 @@ VerboseJoinPart=false
 NoSendJoinPart=true
 PingDelay="1m"

+# T315951: split long messages at 400 chars
+MessageSplit=true
+MessageLength=400
+
 [telegram.bridgebot]
 # See https://core.telegram.org/bots#6-botfather
 # and https://www.linkedin.com/pulse/telegram-bots-beginners-marco-frau

After application:

[18:01]  <    wm-bb> <bd808> (testing, ignore) This is a long message sent from the telegram side of the multi-service matterbridge configuration used in this channel. A message of more than four hundred characters is needed to test the newly applied configuration to explicitly split messages which are over four hundred characters long when they are emitted by the IRC bridge end point. We really do not want  <clipped message>
[18:01]  <    wm-bb> <bd808> to encourage this kind of long message, but we also would like to avoid information assemetry caused by some messages being fully visable only on one of many interfaces (like Telegram but not IRC). See https://phabricator.wikimedia.org/T315951 for more details on the use case being enabled and the investigation of the upstream software's configuration and code.
[18:02]  <    bd808> quiddity: ^ it seems to work with explicit config. :)