k8s MW access logs contain some bad messages that look like an encoding problem, pasting an example below.
AFAICT, they appear to be non-printable characters -- the few I looked at are invalid JSON due to things like \x9c \x13 \x06 \x01
This isn't really a big problem, as it doesn't happen too often and the URLs are fishy to start with, it just means we lose an occasional access log message, undercount accesslog metrics by a tiny bit and have spurious log messages in the Benthos instance processing these.
Examples:
{ "host": { "name": "parse2020.codfw.wmnet" }, "orchestrator": { "type": "kubernetes", "namespace": "mw-web", "resource": { "name": "mw-web.codfw.main-b86f459f5-knthn" } }, "labels": { "deployment": "mw-web", "release": "main" }, "msg": "{\"client.ip\": \"X.X.X.X\", \"ecs.version\": \"1.11.0\", \"event.category\": [\"network\", \"web\"], \"event.dataset\": \"apache.access\", \"event.duration\": 241020, \"event.kind\": \"event\", \"event.outcome\": \"unknown\", \"event.type\": [\"access\", \"connection\"], \"http.request.headers.accept_language\": \"-\", \"http.request.headers.x_analytics\": \"-\", \"http.request.headers.x_client_ip\": \"-\", \"http.request.headers.x_forwarded_for\": \"X.X.X.X\", \"http.request.headers.x_request_id\": \"99e29d43-b7d1-4e36-895f-008874fb7cbc\", \"http.request.method\": \"GET\", \"http.request.referrer\": \"-\", \"http.response.bytes\": 41615, \"http.response.headers.content_type\": \"text/html\", \"http.response.headers.user\": \"-\", \"http.response.status_code\": 404, \"labels.handler\": \"proxy:unix:/run/shared/fpm-www.sock|fcgi://localhost\", \"labels.httpd_server_name\": \"wikipedia\", \"server.ip\": \"127.0.0.1\", \"service.type\": \"apache2\", \"source.ip\": \"127.0.0.1\", \"timestamp\": \"2024-06-11T10:21:37\", \"url.domain\": \"en.wikipedia.org\", \"url.full\": \"http://en.wikipedia.org/wiki/O\\xc2\\x9clepiać\", \"url.path\": \"/wiki/O\\xc2\\x9clepiać\", \"url.query\": \"\", \"user.name\": \"-\", \"user_agent.original\": \"Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/116.0.1938.76 Safari/537.36\"}" } { "host": { "name": "mw1378.eqiad.wmnet" }, "orchestrator": { "type": "kubernetes", "namespace": "mw-web", "resource": { "name": "mw-web.eqiad.main-667b54476d-jcpvw" } }, "labels": { "deployment": "mw-web", "release": "main" }, "msg": "{\"client.ip\": \"X.X.X.X\", \"ecs.version\": \"1.11.0\", \"event.category\": [\"network\", \"web\"], \"event.dataset\": \"apache.access\", \"event.duration\": 196106, \"event.kind\": \"event\", \"event.outcome\": \"unknown\", \"event.type\": [\"access\", \"connection\"], \"http.request.headers.accept_language\": \"-\", \"http.request.headers.x_analytics\": \"-\", \"http.request.headers.x_client_ip\": \"-\", \"http.request.headers.x_forwarded_for\": \"X.X.X.X\", \"http.request.headers.x_request_id\": \"98d04b0b-bb82-434f-bbac-1bd8b1e88b4c\", \"http.request.method\": \"GET\", \"http.request.referrer\": \"-\", \"http.response.bytes\": 23863, \"http.response.headers.content_type\": \"text/html\", \"http.response.headers.user\": \"-\", \"http.response.status_code\": 404, \"labels.handler\": \"proxy:unix:/run/shared/fpm-www.sock|fcgi://localhost\", \"labels.httpd_server_name\": \"wikipedia\", \"server.ip\": \"127.0.0.1\", \"service.type\": \"apache2\", \"source.ip\": \"127.0.0.1\", \"timestamp\": \"2024-06-11T10:22:05\", \"url.domain\": \"uk.wikipedia.org\", \"url.full\": \"http://uk.wikipedia.org/wiki/\\x13\\xd0\\xbe\\xd0\\xbb\\xd0\\xbe\\xd0\\xb2\\xd0\\xbd\\xd0\\xb0_сторінка\", \"url.path\": \"/wiki/\\x13\\xd0\\xbe\\xd0\\xbb\\xd0\\xbe\\xd0\\xb2\\xd0\\xbd\\xd0\\xb0_сторінка\", \"url.query\": \"\", \"user.name\": \"-\", \"user_agent.original\": \"Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)\"}" } { "host": { "name": "kubernetes2025.codfw.wmnet" }, "orchestrator": { "type": "kubernetes", "namespace": "mw-web", "resource": { "name": "mw-web.codfw.main-b86f459f5-mxdnv" } }, "labels": { "deployment": "mw-web", "release": "main" }, "msg": "{\"client.ip\": \"X.X.X.X\", \"ecs.version\": \"1.11.0\", \"event.category\": [\"network\", \"web\"], \"event.dataset\": \"apache.access\", \"event.duration\": 77728, \"event.kind\": \"event\", \"event.outcome\": \"unknown\", \"event.type\": [\"access\", \"connection\"], \"http.request.headers.accept_language\": \"en-us,en;q=0.5\", \"http.request.headers.x_analytics\": \"-\", \"http.request.headers.x_client_ip\": \"-\", \"http.request.headers.x_forwarded_for\": \"X.X.X.X\", \"http.request.headers.x_request_id\": \"b014fa2f-cb9d-4548-aefe-7a585c162fe8\", \"http.request.method\": \"GET\", \"http.request.referrer\": \"http://en.wikipedia.org/wiki/%06_%01_g\", \"http.response.bytes\": 27119, \"http.response.headers.content_type\": \"text/html\", \"http.response.headers.user\": \"-\", \"http.response.status_code\": 404, \"labels.handler\": \"proxy:unix:/run/shared/fpm-www.sock|fcgi://localhost\", \"labels.httpd_server_name\": \"wikipedia\", \"server.ip\": \"127.0.0.1\", \"service.type\": \"apache2\", \"source.ip\": \"127.0.0.1\", \"timestamp\": \"2024-06-11T10:22:11\", \"url.domain\": \"en.wikipedia.org\", \"url.full\": \"http://en.wikipedia.org/wiki/\\x06_\\x01_g\", \"url.path\": \"/wiki/\\x06_\\x01_g\", \"url.query\": \"\", \"user.name\": \"-\", \"user_agent.original\": \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36\"}" }
Found using kafkacat -C -b kafka-logging1001.eqiad.wmnet:9092 -t mediawiki.httpd.accesslog -o end 2>/dev/null | jq '. | select( .["event.duration"] | . == null)'.