root@cp4028:~# fifo-log-tailer -socket this-does-not-exist-at-all.socket 2020/11/30 16:42:38 Unable to read from socket: dial unix this-does-not-exist-at-all.socket: connect: no such file or directory 2020/11/30 16:42:39 Unable to read from socket: dial unix this-does-not-exist-at-all.socket: connect: no such file or directory [...] 2020/11/30 16:42:48 Could not connect to this-does-not-exist-at-all.socket after 10 attempts. Exiting.
Varnish 6.0.7 is behaving well in terms of functionality on cp4032 (T268736).
Unit ordering at boot time is now correct:
Fri, Nov 27
This happened again last night at 2020-11-27T00:08, we had alerts on cp1089, cp1077, cp1087, cp1083 and cp1075 in eqiad, cp2029 (codfw), cp3062 and cp3064 (esams), and cp5009 (eqsin):
Thu, Nov 26
Timing out given that 5 months have passed since this issue was reported and to the best of my knowledge it was an isolated case. Feel free to reopen if it happens again obviously.
@Gilles: FYI during the next few weeks we'll be upgrading to this latest bugfix release. The list of changes (see task description) does not seem to suggest anything that could have an obvious performance impact, but you never know. I am going to upgrade one single node first, see how it behaves for a while and then proceed with the rest.
Wed, Nov 25
I've tried building 6.0.7 on my workstation to double-check the changes between 6.0.6 and 6.0.7 with debdiff. When running the tests, ./bin/varnishtest/tests/m00035.vtc fails with a segmentation fault:
Tue, Nov 24
Mon, Nov 23
Fri, Nov 6
Thu, Nov 5
@Krinkle: anything left TBD here?
Oct 30 2020
I couldn't find @DNdubane_WMF's signature on L3, task description updated accordingly.
@AnneT: please let us know if everything is working as expected!
Oct 29 2020
@calbon: please let me know if you now have access and we can close this. Thanks!
Oct 28 2020
I've pinged @calbon on Google Chat asking to confirm the public key, taking care of the puppet change once I hear from him.
Oct 27 2020
With T266567 out of the way, we can now try different Varnish 6 versions, at least as long as they're VRT-compatible.
Done in libvmod-netmapper 1.9-1, closing.
Given that the amount of changes between 5.1.3 and 6.0.6 is considerable, I was thinking of following this "bisect-like" apporach: package Varnish 6.0.2, try it out on a node, see if there's any difference. If 6.0.2 performs better, than the regression happened between 6.0.3 and 6.0.6, otherwise earlier than 6.0.2.
Oct 26 2020
Oct 23 2020
Oct 22 2020
Oct 21 2020
I found that there's a significant difference between the number of n_objecthead on v5 and v6:
Instead of using hfp vs hfm, I think we might want to distinguish between requests that definitely cannot be cached at the ats-be layer (eg: those with req.http.Authorization) and those that potentially could result in a backend hit, like large_objects_cutoff. The former should honor pass_random in vcl_pass, the latter should always chash no matter what pass_random says?
In case it makes things easier/cleaner, instead of modifying the configuration you could set the capability CAP_NET_BIND_SERVICE.
Oct 20 2020
The following now returns nothing:
I've noticed that on nodes with Varnish 6 the worst time_firstbyte values reported by ats-tls are very often around 26 seconds, and they're due to etherpad. Can you try this once again, but excluding Host: etherpad.wikimedia.org?
It's possible that the extra time comes from something Varnish doesn't measure. It's unclear to me whether the last Timestamp in a Varnish response includes the time it took to actually ship the bytes to the client (ats-tls in this case?) and have them acknowledged.
Oct 19 2020
Oct 16 2020
All varnishkafka instances restarted with 6.0.6-1wm2, CPU usage looks like this now:
We haven't seen this happening anymore after setting transient storage limits. Closing.
Oct 15 2020
I've upgraded cp3050 to 6.0.6-1wm2 and restarted varnishkafka-webrequest.service at 14:12 to pick up the new library. Varnishkafka's CPU usage went down immediately as expected. I've then reloaded the service at 14:21: on systems affected by this bug that would have resulted in CPU usage going back up. On cp3050 CPU usage stayed the same.
Oct 14 2020
I've opened https://github.com/varnishcache/varnish-cache/issues/3436 for 6.5/master, https://github.com/varnishcache/varnish-cache/issues/3437 for 6.0.6 (LTS), and proposed https://github.com/varnishcache/varnish-cache/pull/3438 as a fix for the latter.
Oct 13 2020
The function VUT_Main is the main loop of VUT programs. The while loop boils down to:
Oct 12 2020
Oddly, cp3054 has Content-Length defined in the headers I get back and not cp3052?