Follows-up T288851: Make logging work for mediawiki in k8s.
I noticed that since the increase in MW-on-k8s traffic, that this dashboard has gone from generally having zero jsonTruncated entries, to consistently having a trickle of some messages.
This means that message aggregation is less useful by default, as well as that a lot of noise from non-error messages ends up here (because we don't know the channel or severity of truncated messages, and display them here out of caution).
https://logstash.wikimedia.org/app/dashboards#/view/mediawiki-errors
Upon expanding the ones there, they indeed appear to all be from kubernetes pods. The messages themselves are not problematic. The ones I saw were mostly informational diagnostic messages from the MediaWiki-Rdbms component (about TransactionProfiler violations, formerly known as "DBPerformance"). I believe the reason these are disproportionally implicated, is because they are the most commonly hit diagnostic message that includes a stacktrace.
Looking a bit closer, I was wondering why it is that we see these only get truncated from kubernetes. Copying a few of them and checking their length, I see most of them end up as invalid JSON truncated upto about 8000 chars.
When switching context to older and more rare truncated messages from appservers, they typically report an incomplete JSON string containing 32K characters.
I'm guessing that as part of T288851, the intake pipeline for mw-on-k8s didn't carry over some of the settings or thresholds somewhere along the way!
I further further guess that these do not just affect DBPerformance warnings, but also (more importantly) fatal errors. This is problematic for deployment confidence because Scap and Prometheus alerts rely on Logstash reporting the exception count. But for truncated messages, the host and channel would not be set, as those reside in the truncated part of the JSON blob.