Page MenuHomePhabricator

Move rsyslog-generated mediawiki logs within k8s to their own kafka topics
Closed, ResolvedPublic

Description

Following up from the work we did in T366710: Switch k8s logs to their own kafka topics, and while investigating T384233: Unexpected utilization increase in udp_localhost-info kafka-logging topic I noticed logs we generate and ship from rsyslog *in* k8s itself (i.e. the daemonset) are lacking the same per-k8s-cluster template. We should investigate and make logs uniform there too

Event Timeline

fgiunchedi added a subscriber: JMeybohm.

+ serviceops / k8s for visibility. e.g. @JMeybohm please let me know what you think

I'd assume these are mediawiki generated logs, but I don't think there is an rsyslog daemonset. Do you have some examples?

You are correct these are mw logs only, and no daemonset but a sidecar AFAICS. Example log:

{
  "timestamp": "2025-01-27T11:18:59+00:00",
  "logsource": "mw-web.codfw.main-6b6f6c8754-jccvg",
  "host": "mw-web.codfw.main-6b6f6c8754-jccvg",
  "program": "mediawiki",
  "severity": "info",
  "facility": "user",
  "kubernetes": {
    "host": "wikikube-worker2094.codfw.wmnet",
    "namespace_name": "mw-web",
    "pod_name": "mw-web.codfw.main-6b6f6c8754-jccvg",
    "labels": {
      "deployment": "mw-web",
      "release": "main"
    }
  },
  "@timestamp": "2025-01-27T11:18:59.383150+00:00",
  "@version": 1,
  "message": "Central autologin attempt",
  "type": "mediawiki",
  "channel": "authevents",
  "level": "INFO",
  "monolog_level": 200,
  "wiki": "loginwiki",
  "mwversion": "1.44.0-wmf.13",
  "reqId": "c59137eb-201e-4d02-ad2a-93bdcee769aa",
  "url": "/wiki/Special:CentralAutoLogin/checkLoggedIn?returnUrlToken=xxx&type=redirect&useformat=desktop&useformat=desktop&usesul3=0&wikiid=kowiki",
  "http_method": "GET",
  "server": "login.wikimedia.org",
  "referrer": null,
  "phpversion": "7.4.33",
  "servergroup": "kube-mw-web",
  "normalized_message": "Central autologin attempt",
  "shard": "s3",
  "event": "centralautologin",
  "successful": false,
  "status": "Not centrally logged in",
  "logstash_formatter_key_conflict": [
    "type"
  ],
  "c_type": "redirect",
  "extension": "CentralAuth",
  "accountType": "anon"
}

In puppet we render the topic as k8s- + profile::kubernetes::cluster_name , and logstash consumes all topics starting with k8s-. Note using cluster_name does lead to some duplication in the topic name, though that's not a big deal. At any rate what I'd like to change is the following:

# charts/mediawiki/templates/rsyslog/configmap.yaml.tpl
template(name="udp_localhost_topic" type="string" string="udp_localhost-%syslogseverity-text:::lowercase%")

To something that makes sense from an operational POV and doesn't lead to an explosion in topic names. Reusing the existing topics would be ideal: k8s-{eqiad,codfw} for wikikube and k8s-staging-{eqiad,codfw} for staging (i.e. 4 topics). Another alternative would be sth like k8s-mw-{prod,staging}-{eqiad,codfw}.

fgiunchedi renamed this task from Move rsyslog-generated logs within k8s to their own kafka topics to Move rsyslog-generated mediawiki logs within k8s to their own kafka topics.Jan 27 2025, 11:40 AM

Something else to consider: not only udp-localhost sources but also file sources like php-fpm error log /var/log/php-fpm/error.log should be switched to per-cluster kafka topics.

@JMeybohm re: the above, what .Values could I use in charts/mediawiki/templates/rsyslog/configmap.yaml.tpl to compose a name with prod / staging + codfw/eqiad or something similar that makes sense for kafka topic names ?

Change #1127882 had a related patch set uploaded (by Clément Goubert; author: Clément Goubert):

[operations/deployment-charts@master] mediawiki: Change kafka topic for rsyslog

https://gerrit.wikimedia.org/r/1127882

Change #1128793 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):

[operations/puppet@production] logstash: read k8s-mw topics as needed

https://gerrit.wikimedia.org/r/1128793

Change #1128793 merged by Filippo Giunchedi:

[operations/puppet@production] logstash: read k8s-mw topics as needed

https://gerrit.wikimedia.org/r/1128793

Change #1130615 had a related patch set uploaded (by Cwhite; author: Cwhite):

[operations/puppet@production] beta-logs: fix puppet failure on collector hosts

https://gerrit.wikimedia.org/r/1130615

Change #1130615 merged by Cwhite:

[operations/puppet@production] beta-logs: fix puppet failure on collector hosts

https://gerrit.wikimedia.org/r/1130615

Change #1127882 merged by jenkins-bot:

[operations/deployment-charts@master] mediawiki: Change kafka topic for rsyslog

https://gerrit.wikimedia.org/r/1127882

Mentioned in SAL (#wikimedia-operations) [2025-03-25T10:37:32Z] <cgoubert@deploy1003> Started scap sync-world: 1127882: mediawiki: Change kafka topic for rsyslog - T384335

Mentioned in SAL (#wikimedia-operations) [2025-03-25T10:38:30Z] <cgoubert@deploy1003> cgoubert: 1127882: mediawiki: Change kafka topic for rsyslog - T384335 synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Mentioned in SAL (#wikimedia-operations) [2025-03-25T11:33:42Z] <cgoubert@deploy1003> Started scap sync-world: 1127882: mediawiki: Change kafka topic for rsyslog - T384335

Mentioned in SAL (#wikimedia-operations) [2025-03-25T11:38:26Z] <cgoubert@deploy1003> cgoubert: 1127882: mediawiki: Change kafka topic for rsyslog - T384335 synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Mentioned in SAL (#wikimedia-operations) [2025-03-25T11:48:22Z] <cgoubert@deploy1003> Finished scap sync-world: 1127882: mediawiki: Change kafka topic for rsyslog - T384335 (duration: 15m 00s)

Clement_Goubert claimed this task.
Clement_Goubert subscribed.

Messages are flowing to the new topic, and are correctly ingested by logstash, resolving.