Checking messages sent to the DLQ by Benthos in ulsfo, we discovered some parsing error due to some unexpected missing field in the HAProxy log, the request path.
Errors emitted by Benthos looked like
{
"error": "processors failed: no pattern matches found",
"hostname": "localhost",
"message": "260764 <REDACTED>:30365 8281383 [15/May/2024:13:56:54.703] tls~ tls/backend_server 0/0/0/0/0 200 20099 - - ---- 7140/7131/55/55/0 0/0 {www.wikipedia.org|||||*/*|vers=TLSv1.3;keyx=UNKNOWN;auth=ECDSA;ciph=AES-256-GCM-SHA384;prot=h1;sess=new} {hit-front|text/html|https=1;client_port=30365;nocookies=1|cp4037 hit, cp4037 hit/546860|ATS/9.1.4} GET ",
"priority": 134,
"severity": 6,
"timestamp": "2024-05-15T13:56:54Z"
}The missing parts after the GET verb are the ones that makes Benthos processor fails. This kind of errors are infrequent, if compared to the host request volume, and always targeted at https://www.wikipedia.org, without UA or other significant headers. They seems to be originated from public cloud networks (mainly AWS).
Analyzing the same request as seen by Varnish (with Varnishlog) we noticed that the ReqURL was set to https://www.wikipedia.org. While this is sanitized in Varnish (https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production/modules/varnish/templates/wikimedia-frontend.vcl.erb#272), for HAProxy it's clearly not loggable with our current log-format that uses %HPO (HTTP path only (without host nor query string)).
So, to recap, the problems here are:
- Benthos fails to parse these requests because HAProxy doesn't log them
- We do already have these requests logged by Varnish so it's mandatory to have them also with this new pipeline
Using another HAProxy log-format variable (like %HP) seems the most feasible way to avoid this but we should carefully check that this doesn't change the current DE ingestion pipeline format.