Following up from the work we did in T366710: Switch k8s logs to their own kafka topics, and while investigating T384233: Unexpected utilization increase in udp_localhost-info kafka-logging topic I noticed logs we generate and ship from rsyslog *in* k8s itself (i.e. the daemonset) are lacking the same per-k8s-cluster template. We should investigate and make logs uniform there too
Description
Details
Related Objects
Event Timeline
I'd assume these are mediawiki generated logs, but I don't think there is an rsyslog daemonset. Do you have some examples?
You are correct these are mw logs only, and no daemonset but a sidecar AFAICS. Example log:
{ "timestamp": "2025-01-27T11:18:59+00:00", "logsource": "mw-web.codfw.main-6b6f6c8754-jccvg", "host": "mw-web.codfw.main-6b6f6c8754-jccvg", "program": "mediawiki", "severity": "info", "facility": "user", "kubernetes": { "host": "wikikube-worker2094.codfw.wmnet", "namespace_name": "mw-web", "pod_name": "mw-web.codfw.main-6b6f6c8754-jccvg", "labels": { "deployment": "mw-web", "release": "main" } }, "@timestamp": "2025-01-27T11:18:59.383150+00:00", "@version": 1, "message": "Central autologin attempt", "type": "mediawiki", "channel": "authevents", "level": "INFO", "monolog_level": 200, "wiki": "loginwiki", "mwversion": "1.44.0-wmf.13", "reqId": "c59137eb-201e-4d02-ad2a-93bdcee769aa", "url": "/wiki/Special:CentralAutoLogin/checkLoggedIn?returnUrlToken=xxx&type=redirect&useformat=desktop&useformat=desktop&usesul3=0&wikiid=kowiki", "http_method": "GET", "server": "login.wikimedia.org", "referrer": null, "phpversion": "7.4.33", "servergroup": "kube-mw-web", "normalized_message": "Central autologin attempt", "shard": "s3", "event": "centralautologin", "successful": false, "status": "Not centrally logged in", "logstash_formatter_key_conflict": [ "type" ], "c_type": "redirect", "extension": "CentralAuth", "accountType": "anon" }
In puppet we render the topic as k8s- + profile::kubernetes::cluster_name , and logstash consumes all topics starting with k8s-. Note using cluster_name does lead to some duplication in the topic name, though that's not a big deal. At any rate what I'd like to change is the following:
# charts/mediawiki/templates/rsyslog/configmap.yaml.tpl template(name="udp_localhost_topic" type="string" string="udp_localhost-%syslogseverity-text:::lowercase%")
To something that makes sense from an operational POV and doesn't lead to an explosion in topic names. Reusing the existing topics would be ideal: k8s-{eqiad,codfw} for wikikube and k8s-staging-{eqiad,codfw} for staging (i.e. 4 topics). Another alternative would be sth like k8s-mw-{prod,staging}-{eqiad,codfw}.
Something else to consider: not only udp-localhost sources but also file sources like php-fpm error log /var/log/php-fpm/error.log should be switched to per-cluster kafka topics.
@JMeybohm re: the above, what .Values could I use in charts/mediawiki/templates/rsyslog/configmap.yaml.tpl to compose a name with prod / staging + codfw/eqiad or something similar that makes sense for kafka topic names ?
Change #1127882 had a related patch set uploaded (by Clément Goubert; author: Clément Goubert):
[operations/deployment-charts@master] mediawiki: Change kafka topic for rsyslog
Change #1128793 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):
[operations/puppet@production] logstash: read k8s-mw topics as needed
Change #1128793 merged by Filippo Giunchedi:
[operations/puppet@production] logstash: read k8s-mw topics as needed
Change #1130615 had a related patch set uploaded (by Cwhite; author: Cwhite):
[operations/puppet@production] beta-logs: fix puppet failure on collector hosts
Change #1130615 merged by Cwhite:
[operations/puppet@production] beta-logs: fix puppet failure on collector hosts
Change #1127882 merged by jenkins-bot:
[operations/deployment-charts@master] mediawiki: Change kafka topic for rsyslog
Mentioned in SAL (#wikimedia-operations) [2025-03-25T10:37:32Z] <cgoubert@deploy1003> Started scap sync-world: 1127882: mediawiki: Change kafka topic for rsyslog - T384335
Mentioned in SAL (#wikimedia-operations) [2025-03-25T10:38:30Z] <cgoubert@deploy1003> cgoubert: 1127882: mediawiki: Change kafka topic for rsyslog - T384335 synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
Mentioned in SAL (#wikimedia-operations) [2025-03-25T11:33:42Z] <cgoubert@deploy1003> Started scap sync-world: 1127882: mediawiki: Change kafka topic for rsyslog - T384335
Mentioned in SAL (#wikimedia-operations) [2025-03-25T11:38:26Z] <cgoubert@deploy1003> cgoubert: 1127882: mediawiki: Change kafka topic for rsyslog - T384335 synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
Mentioned in SAL (#wikimedia-operations) [2025-03-25T11:48:22Z] <cgoubert@deploy1003> Finished scap sync-world: 1127882: mediawiki: Change kafka topic for rsyslog - T384335 (duration: 15m 00s)
Messages are flowing to the new topic, and are correctly ingested by logstash, resolving.