User Details
- User Since
- Feb 12 2018, 9:51 AM (331 w, 1 d)
- Availability
- Available
- IRC Nick
- vgutierrez
- LDAP User
- Vgutierrez
- MediaWiki User
- VGutiérrez (WMF) [ Global Accounts ]
Today
we already leverage preconnect on some cases but not as an HTTP Header but using the HTML <link> tag:
$ curl -v -s https://en.wikipedia.org/wiki/Main_Page 2>&1 |grep -i preconnect <link rel="preconnect" href="//upload.wikimedia.org">
Yesterday
Don't be to aggressive with this one, we could need to rollback at some point, let's wait a few weeks at the very least
Wed, Jun 12
Tue, Jun 11
ferm based MSS clamping is live on ncredir cluster
Mon, Jun 10
Thu, Jun 6
Wed, Jun 5
Mon, Jun 3
Thu, May 23
tcp-mss-clamper is being already used to perform MSS clamping on ncredir and CDN upload clusters
Wed, May 22
Tue, May 21
acme-chief 0.37 deployed shipping https://gitlab.wikimedia.org/repos/sre/acme-chief/-/merge_requests/8
Scope of the task should be rejecting invalid HTTP requests on HAProxy rather than varnish as soon as we have analytics moved to HAProxy (and not only HTTP/1.0 ones)
Personally I'm not sold on the idea of decreasing the key size, @BBlack what are your thoughts?
Mon, May 20
Sun, May 19
Chacha20 is faster than AES when both are running without hardware acceleration. If AES-NI is present, AES is faster. This is also considered by clients to choose their ciphersuite suite to be sent to the server as part of the ClientHello
May 16 2024
removed raw sockets usage. We are now fetching MSS data via getsockopts()
May 13 2024
we had a big spike of 503s on eqiad/drmrs/esams yesterday during EU morning: https://grafana.wikimedia.org/goto/J4YqQuYIR?orgId=1:
file_metadata is already there and supports individual files. The only limitation is that it's currently expecting the parameters links=manage&source_permissions=ignore that correlate with the file resource attributes with the same names (https://www.puppet.com/docs/puppet/7/types/file.html#file-attribute-source_permissions && https://www.puppet.com/docs/puppet/7/types/file.html#file-attribute-links).
May 10 2024
your request is missing some required parameters for file_metadata endpoint:
$ curl -H "Accept: application/json" --cert /var/lib/puppet/ssl/certs/mx-out1001.wikimedia.org.pem --key /var/lib/puppet/ssl/private_keys/mx-out1001.wikimedia.org.pem --cacert /var/lib/puppet/ssl/certs/ca.pem "https://acmechief2002.codfw.wmnet:8140/puppet/v3/file_metadata/acmedata/mx-out/live/ec-prime256v1.crt?links=manage&source_permissions=ignore" |jq { "checksum": { "type": "md5", "value": "{md5}b86fb140f227639d70ad971db461c82c" }, "destination": null, "group": 498, "links": "manage", "mode": 420, "owner": 498, "path": "/etc/acmecerts/live/ec-prime256v1.crt", "relative_path": null, "type": "file" }
May 8 2024
May 7 2024
https://gerrit.wikimedia.org/r/1028818 removed the prometheus jobs, alert should go away as soon as puppet runs on the prometheus hosts.
that's right, this is a leftover from the migration from mtail to benthos on ncredir. We will take care of it ASAP.
May 6 2024
May 3 2024
Apr 30 2024
Apr 19 2024
running another test this time with 3x10k requests it looks like the culprit is the socket_server UDP input that drops packets:
processor_latency_ns_count{label="syslog_format",path="root.input.processors.0"} 29963 vgutierrez@ncredir2001:/var/log/nginx$ cat /proc/net/udp sl local_address rem_address st tx_queue rx_queue tr tm->when retrnsmt uid timeout inode ref pointer drops 1330: 0100007F:04C5 00000000:0000 07 00000000:00000000 00:00000000 00000000 18837 0 64597138 2 0000000000000000 37
Testing benthos on ncredir2001 shows some concerning results (TL;DR it looks like benthos drops some messages and metrics aren't as accurate as expected).
Apr 17 2024
Apr 15 2024
Apr 11 2024
Apr 9 2024
Apr 8 2024
Feb 12 2024
Feb 7 2024
sadly varnish is not able to tell between a client that goes away earlier than expected (by poor Internet access) triggering a backend fetch error from an actual backend fetch error where the client connection is healthy but varnish is unable to reach the backend server.
Feb 6 2024
@BTullis it's origin related:
I can reproduce via text@drmrs, I'll take a look ASAP :)
Feb 4 2024
Jan 31 2024
Fix already released on HAProxy 2.9: https://www.mail-archive.com/haproxy@formilux.org/msg44547.html
Jan 30 2024
IIRC that was done to smooth the reimage process and first puppet run on various roles using fifo-log-demux.
Jan 23 2024
Jan 22 2024
as suggested by Willy Tarreau on https://github.com/haproxy/haproxy/issues/2403#issuecomment-1900111538 this issue could be easier to debug on HAProxy 2.8