Page MenuHomePhabricator

metric discrepancies between ATS 9.x and ATS 8.x
Closed, ResolvedPublic

Description

After updating to ATS 9.1.2 we have detected an increase on self-reported metrics by ATS. On both 8.x and 9.x instances TTFB is gathered with %<{TS_MILESTONE_UA_BEGIN_WRITE-TS_MILESTONE_SM_START}msdms>. As it can be seen in this dashboard:

image.png (1×2 px, 347 KB)

TTFB as reported by ATS has increased ~25% while ATS reported by the varnish instances that feed traffic to ATS remained basically the same.

We've also detected some differences on reported plugin time, we log total plugin time on both 8.x and 9.x instances with %<{TS_MILESTONE_PLUGIN_TOTAL-TS_MILESTONE_SM_START}msdms>. ATS 8.x instances almost always report a total plugin time of 0ms:

vgutierrez@cp6015:~$ curl -s http://127.0.0.1:3904/metrics |egrep "total_plugin_time_(sum|count)" |grep appservers-rw 
trafficserver_backend_total_plugin_time_sum{backend="appservers-rw.discovery.wmnet",prog="atsbackend.mtail"} 0.33
trafficserver_backend_total_plugin_time_count{backend="appservers-rw.discovery.wmnet",prog="atsbackend.mtail"} 8.036859e+06

While ATS 9.x instances report >0ms more often (bucket measures time in seconds not ms):

trafficserver_backend_total_plugin_time_sum{backend="appservers-rw.discovery.wmnet",prog="atsbackend.mtail"} 1486.0159999757298
trafficserver_backend_total_plugin_time_count{backend="appservers-rw.discovery.wmnet",prog="atsbackend.mtail"} 3.439146e+06

As expected, the described behavior hasn't changed after updating to 9.1.3

Issue has been reported to upstream in: https://github.com/apache/trafficserver/issues/9028

Event Timeline

Vgutierrez triaged this task as Medium priority.Sep 6 2022, 1:40 PM

This could have been solved by T316938

Vgutierrez claimed this task.