Page MenuHomePhabricator

HAProxy sometimes does not apply host normalization
Open, Needs TriagePublic

Description

Background

Prior to T392880 and T351117, Varnish applied hostname normalization. This was (presumably) effective and was (presumably) observable in varnishkafka output:

  • Kafkacat: webrequest_frontend_text
  • Hadoop: wmf.webrequest and wmf.pageview_actor datasets
  • Turnilo: webrequest_sampled_live

Problem

This isn't the case today. This came up during T405429 where I found some numbers didn't add up.

Turnilo: https://w.wiki/Fkf7

Screenshot 2025-10-20 at 13.50.56.png (862×768 px, 97 KB)

Hadoop
SELECT COUNT(*) _count, uri_host
FROM wmf.pageview_actor
WHERE year=2025 AND month=10 AND day=19
AND uri_host!=LOWER(uri_host)
GROUP BY uri_host ORDER BY _count DESC;
Commons.wikimedia.org
EN.m.wikipedia.org
commons.Wikimedia.org
EN.Wiktionary.org
En.Wiktionary.org
…