Page MenuHomePhabricator

Improve field mapping for nginx logstash
Closed, ResolvedPublic


Right now all fields from nginx channel to logstash are mapped as text. Which is bad for fields which are actually numeric, as it prevents aggregation.

I think it makes sense to set up config for the following:

  • request_time: float
  • response_size: long
  • upstream_time: float

We can define additional mappings in logstash template.

Event Timeline

My checks show that request_time and upstream_time only used by nginx, but response_size is also used by webrequest. While it is used in the same meaning, I wonder if changing the type won't disrupt anything. response_size is less important for me than the rest, so if it proves a problem, we could omit it.

There might be other opinions, but i think hard coding specific fields to specific types in the logstash config is reasonable as long as it's documented. The primary problem we run into is that without coordination different types can be sent to fields by different applications. While documenting and expecting applications to conform to types for all fields is not going to happen, doing it for some limited set of useful fields seems acceptable to me.

Change 386317 had a related patch set uploaded (by Smalyshev; owner: Smalyshev):
[operations/puppet@production] Add types to some fields used by nginx

Change 386317 merged by Gehel:
[operations/puppet@production] Add types to some fields used by nginx

Change merged and deployed on all logstash servers. If I understand correctly, we need to wait for new index creation to check if this worked correctly.

Mentioned in SAL (#wikimedia-operations) [2017-10-25T14:51:30Z] <ebernhardson> update logstash template on logstash elsaticsearch cluster for T178530

Mentioned in SAL (#wikimedia-operations) [2017-10-25T14:52:33Z] <gehel> applying new template to elasticsearch / logstash - T178530

debt claimed this task.