Page MenuHomePhabricator

Performance impact evaluation of enabling nginx-lua and nginx-lua-prometheus on tlsproxy
Closed, ResolvedPublic

Description

We have added lua support to tlsproxy, and left it disabled by default. Based on nginx-lua, we allowed to expose nginx metrics for prometheus such as request latency and response status.

The performance impact of those features needs to be evaluated before enabling them in production.

Event Timeline

ema triaged this task as Medium priority.Mar 22 2017, 11:41 AM

Methodology

We have benchmarked nginx performance on pinkunicorn (cp1008). An additional nginx location block has been added to ensure that the benchmark actually tested the performance of nginx rather than of other parts of the stack:

# /etc/nginx/sites-enabled/unified
location = /lua_bench {
    return 200;
}

hey has been used as an HTTP load generator with the following options:

# -c concurrent requests, -n number of requests
hey -cpus 8 -c 200 -n 50000 https://localhost/lua_bench

Four different nginx configurations have been tested:

no-luanginx lua module disabled
lua-no-prometheusnginx lua module enabled, prometheus metrics disabled
prometheus-no-latencynginx lua module and prometheus metrics enabled, status code only
prometheus-fullnginx lua module and prometheus metrics enabled, status code and latency

For each configuration we have performed 200 runs.

Results

Number of requests per second served for each configuration.

Configuration Min Median Max Standard Deviation
no-lua21986.2229382.3032182.191091.46
lua-no-prometheus21305.1230003.0933827.592186.90
prometheus-no-latency19461.7225853.6527188.881106.30
prometheus-full15642.4217847.4818563.96468.22

Conclusions

The nginx-lua module can be enabled immediately as it introduces no slowdowns in nginx . Prometheus metrics support, as it is, adds significant performance penalties and should not be enabled. However, it is encouraging that reducing the scope to response status only (thus excluding latency measurements) also reduces the penalty introduced. We should tune the lua script and see if it is possible to reduce the slowdown to acceptable levels.

nginx-lua-prometheus uses a dictionary in shared memory that gets updated by the nginx worker processes.

I've tried two additional approaches to see if contention could be a factor in decreasing performance: i) reduce lua code to a minimum while still using shared memory ii) keep stats on a per-worker basis and only update the shared data structure twice a second through ngx.timer.at.

i) simple approach with shared dictionary

lua_shared_dict global_count 5M;

log_by_lua_block {
    local global_count = ngx.shared.global_count
    local newval, err = global_count:incr(status, 1)
    if not newval and err == "not found" then
        global_count:set(ngx.var.status, 1)
    end
}

ii) local updates

lua_shared_dict global_count 5M;

init_worker_by_lua_block {
    local worker_status_codes = require("prometheus")
    worker_status_codes.update_global_count()
}
    
    
log_by_lua_block {
    -- Increment per-worker stats.
    local worker_status_codes = require("prometheus")
    worker_status_codes.update_local_count(ngx.var.status, 1)
}
-- /etc/nginx/lua/prometheus.lua
-- See https://github.com/openresty/lua-nginx-module/#data-sharing-within-an-nginx-worker
local _M = {}

local data = {}

function _M.update_local_count(status, value)
    if data[status] then
        data[status] = data[status] + value
    else
        data[status] = value
    end
end

function _M.update_global_count()
    local global_count = ngx.shared.global_count

    -- data contains per-worker status codes
    for key, value in pairs(data) do
        -- incr and set are operations on global_count which is shared among nginx worker processes
        local newval, err = global_count:incr(key, value)
        if not newval and err == "not found" then
            global_count:set(key, value)
        end
    end

    -- reset local counter
    data = {}

    -- schedule next global counter update in 0.5s
    ngx.timer.at(0.5, _M.update_global_count)
end

return _M

Performance-wise, there doesn't seem to be any significant advantage in following the more complicated approach:

ConfigurationMinMedianMaxStandard Deviation
shared-dict21009.0928735.7731436.541389.97
local-updates21965.2428329.3031815.981139.40

Change 345123 had a related patch set uploaded (by Ema):
[operations/puppet@production] tlsproxy: simplify prometheus metrics gathering

https://gerrit.wikimedia.org/r/345123

Change 345123 merged by Ema:
[operations/puppet@production] tlsproxy: simplify prometheus metrics gathering

https://gerrit.wikimedia.org/r/345123

@ema, it seems like the task as described has been completed (awesome work and great presentation btw!). Is there anything left to be done or shall we resolve this task?

ema claimed this task.

@ema, it seems like the task as described has been completed (awesome work and great presentation btw!). Is there anything left to be done or shall we resolve this task?

Thanks! Yeah, closing given that performance impact has been evaluated. We still need to enable the feature in prod but that's unrelated to this task.