Page MenuHomePhabricator

glogger produces invalid JSON when given input with non-printable characters
Open, LowPublicBUG REPORT

Description

k8s MW access logs contain some bad messages that look like an encoding problem, pasting an example below.

AFAICT, they appear to be non-printable characters -- the few I looked at are invalid JSON due to things like \x9c \x13 \x06 \x01

This isn't really a big problem, as it doesn't happen too often and the URLs are fishy to start with, it just means we lose an occasional access log message, undercount accesslog metrics by a tiny bit and have spurious log messages in the Benthos instance processing these.

Examples:

{
  "host": {
    "name": "parse2020.codfw.wmnet"
  },
  "orchestrator": {
    "type": "kubernetes",
    "namespace": "mw-web",
    "resource": {
      "name": "mw-web.codfw.main-b86f459f5-knthn"
    }
  },
  "labels": {
    "deployment": "mw-web",
    "release": "main"
  },
  "msg": "{\"client.ip\": \"X.X.X.X\", \"ecs.version\": \"1.11.0\", \"event.category\": [\"network\", \"web\"], \"event.dataset\": \"apache.access\", \"event.duration\": 241020, \"event.kind\": \"event\", \"event.outcome\": \"unknown\", \"event.type\": [\"access\", \"connection\"], \"http.request.headers.accept_language\": \"-\", \"http.request.headers.x_analytics\": \"-\", \"http.request.headers.x_client_ip\": \"-\", \"http.request.headers.x_forwarded_for\": \"X.X.X.X\", \"http.request.headers.x_request_id\": \"99e29d43-b7d1-4e36-895f-008874fb7cbc\", \"http.request.method\": \"GET\", \"http.request.referrer\": \"-\", \"http.response.bytes\": 41615, \"http.response.headers.content_type\": \"text/html\", \"http.response.headers.user\": \"-\", \"http.response.status_code\": 404, \"labels.handler\": \"proxy:unix:/run/shared/fpm-www.sock|fcgi://localhost\", \"labels.httpd_server_name\": \"wikipedia\", \"server.ip\": \"127.0.0.1\", \"service.type\": \"apache2\", \"source.ip\": \"127.0.0.1\", \"timestamp\": \"2024-06-11T10:21:37\", \"url.domain\": \"en.wikipedia.org\", \"url.full\": \"http://en.wikipedia.org/wiki/O\\xc2\\x9clepiać\", \"url.path\": \"/wiki/O\\xc2\\x9clepiać\", \"url.query\": \"\", \"user.name\": \"-\", \"user_agent.original\": \"Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/116.0.1938.76 Safari/537.36\"}"
}
{
  "host": {
    "name": "mw1378.eqiad.wmnet"
  },
  "orchestrator": {
    "type": "kubernetes",
    "namespace": "mw-web",
    "resource": {
      "name": "mw-web.eqiad.main-667b54476d-jcpvw"
    }
  },
  "labels": {
    "deployment": "mw-web",
    "release": "main"
  },
  "msg": "{\"client.ip\": \"X.X.X.X\", \"ecs.version\": \"1.11.0\", \"event.category\": [\"network\", \"web\"], \"event.dataset\": \"apache.access\", \"event.duration\": 196106, \"event.kind\": \"event\", \"event.outcome\": \"unknown\", \"event.type\": [\"access\", \"connection\"], \"http.request.headers.accept_language\": \"-\", \"http.request.headers.x_analytics\": \"-\", \"http.request.headers.x_client_ip\": \"-\", \"http.request.headers.x_forwarded_for\": \"X.X.X.X\", \"http.request.headers.x_request_id\": \"98d04b0b-bb82-434f-bbac-1bd8b1e88b4c\", \"http.request.method\": \"GET\", \"http.request.referrer\": \"-\", \"http.response.bytes\": 23863, \"http.response.headers.content_type\": \"text/html\", \"http.response.headers.user\": \"-\", \"http.response.status_code\": 404, \"labels.handler\": \"proxy:unix:/run/shared/fpm-www.sock|fcgi://localhost\", \"labels.httpd_server_name\": \"wikipedia\", \"server.ip\": \"127.0.0.1\", \"service.type\": \"apache2\", \"source.ip\": \"127.0.0.1\", \"timestamp\": \"2024-06-11T10:22:05\", \"url.domain\": \"uk.wikipedia.org\", \"url.full\": \"http://uk.wikipedia.org/wiki/\\x13\\xd0\\xbe\\xd0\\xbb\\xd0\\xbe\\xd0\\xb2\\xd0\\xbd\\xd0\\xb0_сторінка\", \"url.path\": \"/wiki/\\x13\\xd0\\xbe\\xd0\\xbb\\xd0\\xbe\\xd0\\xb2\\xd0\\xbd\\xd0\\xb0_сторінка\", \"url.query\": \"\", \"user.name\": \"-\", \"user_agent.original\": \"Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)\"}"
}
{
  "host": {
    "name": "kubernetes2025.codfw.wmnet"
  },
  "orchestrator": {
    "type": "kubernetes",
    "namespace": "mw-web",
    "resource": {
      "name": "mw-web.codfw.main-b86f459f5-mxdnv"
    }
  },
  "labels": {
    "deployment": "mw-web",
    "release": "main"
  },
  "msg": "{\"client.ip\": \"X.X.X.X\", \"ecs.version\": \"1.11.0\", \"event.category\": [\"network\", \"web\"], \"event.dataset\": \"apache.access\", \"event.duration\": 77728, \"event.kind\": \"event\", \"event.outcome\": \"unknown\", \"event.type\": [\"access\", \"connection\"], \"http.request.headers.accept_language\": \"en-us,en;q=0.5\", \"http.request.headers.x_analytics\": \"-\", \"http.request.headers.x_client_ip\": \"-\", \"http.request.headers.x_forwarded_for\": \"X.X.X.X\", \"http.request.headers.x_request_id\": \"b014fa2f-cb9d-4548-aefe-7a585c162fe8\", \"http.request.method\": \"GET\", \"http.request.referrer\": \"http://en.wikipedia.org/wiki/%06_%01_g\", \"http.response.bytes\": 27119, \"http.response.headers.content_type\": \"text/html\", \"http.response.headers.user\": \"-\", \"http.response.status_code\": 404, \"labels.handler\": \"proxy:unix:/run/shared/fpm-www.sock|fcgi://localhost\", \"labels.httpd_server_name\": \"wikipedia\", \"server.ip\": \"127.0.0.1\", \"service.type\": \"apache2\", \"source.ip\": \"127.0.0.1\", \"timestamp\": \"2024-06-11T10:22:11\", \"url.domain\": \"en.wikipedia.org\", \"url.full\": \"http://en.wikipedia.org/wiki/\\x06_\\x01_g\", \"url.path\": \"/wiki/\\x06_\\x01_g\", \"url.query\": \"\", \"user.name\": \"-\", \"user_agent.original\": \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36\"}"
}

Found using kafkacat -C -b kafka-logging1001.eqiad.wmnet:9092 -t mediawiki.httpd.accesslog -o end 2>/dev/null | jq '. | select( .["event.duration"] | . == null)'.

Details

TitleReferenceAuthorSource BranchDest Branch
Correctly escape invalid utf-8repos/sre/glogger!4oblivianfix_single_encodemain
Customize query in GitLab