User Details
- User Since
- Feb 12 2018, 9:51 AM (427 w, 2 d)
- Availability
- Available
- IRC Nick
- vgutierrez
- LDAP User
- Vgutierrez
- MediaWiki User
- VGutiérrez (WMF) [ Global Accounts ]
Fri, Apr 10
meanwhile I've restored the component and uploaded haproxy 2.8.20 there
Closing the task as this has been investigated
Wed, Apr 8
I've replicated locally a SSL handshake failure using haproxy with log-format-sd %{+E}o\ [haproxykafka@0\ %[capture.req.hdr(0),json(ascii)]|%HPO|%HQ|%rt]
Yes, sequence numbers are enerated by haproxy itself, even if it results in a SSL handshake error where the sequence number doesn't reach haproxykafka when using haproxy 3.0 because the log format is ignored for that kind of error.
Tue, Apr 7
It looks like the root cause is MEDIUM: log/session: handle embryonic session log within sess_log(). A change introduced in HAProxy 3.1 as part of the work done to introduce the log profiles feature.
Mon, Apr 6
Mon, Mar 30
Thu, Mar 26
Wed, Mar 25
this is a side effect of moving healthchecks on upload from healthcheck.wm.org to upload.wm.o in https://gerrit.wikimedia.org/r/c/operations/puppet/+/1164466
For instance, a recent decommissioning of codfw cp nodes (T419753) left the legacy ats-be service unavailable and caused the (depooled!) nodes to increment varnish_sli_bad as there were no servers left in the site according to them
There is no CI check for validity of lua scripts loaded in trafficserver; some have functional tests but a syntax error should have never passed CI, even in the absence of tests
Mar 18 2026
this is a by-product of switching to Katran and not a feature that we can deploy independently at the moment, so in eqiad and codfw this is currently blocked till we can switch every service to IPIP encapsulation
Mar 12 2026
Mar 11 2026
Mar 9 2026
with the patch applied acme-chief was able to issue the certificate a few hours after reporting the issue to GTS (no active answer from them though)
I've tried to patch our client to skip already validated challenges but I'm running into another issue, this is the request flow performed by acme-chief:
Mar 8 2026
Mar 6 2026
Mar 5 2026
Mar 4 2026
this seems to be tracked as T417278
Mar 2 2026
Feb 27 2026
could I suggest using a dedicated prefix for API/REST gateway headers? We already use internally X-WMF as a prefix on other layers that aren't related to API/REST gateways.
Feb 26 2026
change has been merged, and it should be live by now
Feb 25 2026
change has been merged, please allow puppet to propagate the change, it could take up to 30 minutes
SSH has been verified out-of-band
got mcollins approval via Slack, we need Data-Engineering approval now (that's @Milimetric / @Ottomata)
waiting for mcollins approval, I've pinged them on Slack cause I've failed to find their phabricator user so far
Feb 24 2026
Feb 19 2026
blast radius is big.. I'm wondering if k8s nodes have workloads not exposed to the Internet where having a bigger MTU (thinking about jumbo frames here) could be beneficial or even required in performance terms.
Feb 18 2026
fixed by:
- Retrying if we can't fetch the MAC address
- Reporting the configured MAC address
- Refusing to start if we can't fetch the MAC address after 10 attempts
- Triggering an ARP resolution if the MAC address isn't on the kernel neighbors table
Feb 17 2026
Feb 16 2026
I managed to trigger this while capturing the traffic between ATS and gerrit2003, in my run it failed fetching https://gerrit.wikimedia.org/r/mediawiki/extensions/RelatedArticles, this is the content of the offending request:
Feb 12 2026
HAProxy 3.0 bumps to lua 5.4, as a consequence HAProxy fails to start cause lua5.4-maxminddb isn't there:
Feb 12 15:51:33 cp4052 haproxy[2379768]: [NOTICE] (2379768) : haproxy version is 3.0.15-1~bpo11+1 Feb 12 15:51:33 cp4052 haproxy[2379768]: [NOTICE] (2379768) : path to executable is /usr/sbin/haproxy Feb 12 15:51:33 cp4052 haproxy[2379768]: [ALERT] (2379768) : config : parsing [/etc/haproxy/haproxy.cfg:18] : Lua runtime error: /etc/haproxy/lua/maxmind-lookup.lua:3: module 'maxminddb' not found: Feb 12 15:51:33 cp4052 haproxy[2379768]: no field package.preload['maxminddb'] Feb 12 15:51:33 cp4052 haproxy[2379768]: no file '/etc/haproxy/lua/private/maxminddb.lua' Feb 12 15:51:33 cp4052 haproxy[2379768]: no file '/usr/local/share/lua/5.4/maxminddb.lua' Feb 12 15:51:33 cp4052 haproxy[2379768]: no file '/usr/local/share/lua/5.4/maxminddb/init.lua' Feb 12 15:51:33 cp4052 haproxy[2379768]: no file '/usr/local/lib/lua/5.4/maxminddb.lua' Feb 12 15:51:33 cp4052 haproxy[2379768]: no file '/usr/local/lib/lua/5.4/maxminddb/init.lua' Feb 12 15:51:33 cp4052 haproxy[2379768]: no file '/usr/share/lua/5.4/maxminddb.lua' Feb 12 15:51:33 cp4052 haproxy[2379768]: no file '/usr/share/lua/5.4/maxminddb/init.lua' Feb 12 15:51:33 cp4052 haproxy[2379768]: no file './maxminddb.lua' Feb 12 15:51:33 cp4052 haproxy[2379768]: no file './maxminddb/init.lua' Feb 12 15:51:33 cp4052 haproxy[2379768]: no file '/usr/local/lib/lua/5.4/maxminddb.so' Feb 12 15:51:33 cp4052 haproxy[2379768]: no file '/usr/lib/x86_64-linux-gnu/lua/5.4/maxminddb.so' Feb 12 15:51:33 cp4052 haproxy[2379768]: no file '/usr/lib/lua/5.4/maxminddb.so' Feb 12 15:51:33 cp4052 haproxy[2379768]: no file '/usr/local/lib/lua/5.4/loadall.so' Feb 12 15:51:33 cp4052 haproxy[2379768]: no file './maxminddb.so' Feb 12 15:51:33 cp4052 haproxy[2379768]: [ALERT] (2379768) : config : Error(s) found in configuration file : /etc/haproxy/haproxy.cfg Feb 12 15:51:33 cp4052 haproxy[2379768]: [ALERT] (2379768) : config : parsing [/etc/haproxy/conf.d/tls.cfg:180] : error detected in proxy 'tls' while parsing 'http-request set-var(req.provenance,ifnotexists,ifnotempty)' rule : unknown fetch method 'lua.fetch_isp'. Feb 12 15:51:33 cp4052 haproxy[2379768]: [ALERT] (2379768) : config : Error(s) found in configuration file : /etc/haproxy/conf.d/tls.cfg Feb 12 15:51:33 cp4052 systemd[1]: haproxy.service: Control process exited, code=exited, status=1/FAILURE Feb 12 15:51:33 cp4052 systemd[1]: Reload failed for HAProxy Load Balancer.
Feb 5 2026
Feb 3 2026
You just need to expose caniuse.com browser data to hiddenparma.
Jan 23 2026
Jan 21 2026
It's not clear why the net.ipv4.conf.default.rp_filter would need to change to 0 for IPIP and we would like to not do that on k8s nodes to prevent IP spoofing from inside containers (although that would require CAP_NET_ADMIN). If we could keep net.ipv4.conf.default.rp_filter=1 the dynamically created calico interfaces would still inherit that setting while net.ipv4.conf.all.rp_filter=0 would allow the ipip interfaces to also have rp_filter=0 set.
Jan 19 2026
oh I see... you're using https://gerrit.wikimedia.org/ in the healthcheck URL instead of https://healthcheck.wikimedia.org/varnish-fe this needs to be fixed
hmm why these healthchecks from liberica are hitting the backend server in eqiad instead of staying in the cp nodes?
Jan 15 2026
Jan 14 2026
the headers described on https://wikitech.wikimedia.org/wiki/CDN/Backend_api and x-ja3n/x-ja4h should be hitting MediaWiki already
Jan 13 2026
Jan 12 2026
Dec 4 2025
the assessment is OK and the link can be removed safely
