Use gzip for logstash
Closed, DeclinedPublic
Actions

Assigned To

Authored By

	EBernhardson
	Mar 27 2017, 9:26 PM

Description

While checking into the upgrade of logstash to 5.x i noticed a couple errors due to malformed GELF logging requests. This is explicitly *not* a problem with the 5.x upgrade, our 1.5.x install in production is logging the same errors, I just noticed these because i was looking over logs while preparing the upgrade.

These are basically udp messages formatted with json received over port 12201. One example message:

{"@timestamp":"2017-03-27T21:11:24","type":"ores","logger_name":"uwsgi","host":"deployment-sca03","level":"ERROR","message":"[pid: 31379] 10.68.21.68 (-) {32 vars in 521 bytes} [Mon Mar 27 21:11:24 2017] GET /scores/enwiki/goodfaith/?model_info=test_stats&format=json => generated 2060 bytes in 7 msecs (HTTP/1.1 200) 6 headers in 209 bytes (1 switches on core 0) user agent \"MediaWiki/1.29.0-alpha\""}

The problem here is the logstash can only accept compressed input over GELF, plaintext is not supported. I'm no uwsgi expert so can't provide exact details on how to fix, but for the logs to be accepted by logstash and saved into elasticsearch, to be viewed in kibana, the uwsgi config in /etc/uwsgi/apps-available/ores.ini will need to be updated to compress the data sent out over the socket connection.

Details

	Subject	Repo	Branch	Lines +/-
	service: use gzip for logging in uwsgi	operations/puppet	production	+2 -1

Customize query in gerrit

Related Objects

Mentioned In: T161908: ELK 5.x deployment plan

Event Timeline

EBernhardson created this task.Mar 27 2017, 9:26 PM

Restricted Application added a project: Machine-Learning-Team. · View Herald TranscriptMar 27 2017, 9:26 PM

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

In production this error has been logged 704,874 times between 2017-03-27T06:25:20 and 2017-03-27T21:28:06.41, or just under 800 times per minute. The full cluster logs ~15k messages per second, so adding these to the set is reasonable to handle. If we don't actually need all these messages though, as they have never been previously available via centralized logging, it might be worth just turning them off.

EBernhardson added a project: Wikimedia-Logstash.Mar 27 2017, 9:32 PM

EBernhardson mentioned this in T161908: ELK 5.x deployment plan.Mar 31 2017, 4:25 PM

Halfak assigned this task to Ladsgroup.Apr 13 2017, 3:14 PM

Restricted Application added a project: User-Ladsgroup. · View Herald TranscriptApr 13 2017, 3:14 PM

Halfak triaged this task as High priority.Apr 13 2017, 3:15 PM

Halfak moved this task from Unsorted to Maintenance/cleanup on the Machine-Learning-Team board.

Mentioned in SAL (#wikimedia-releng) [2017-04-14T00:45:31Z] <Amir1> cherry-picking 348184/1 (T161563)

The uwsgi logging config is setup to send json datagrams to port 11514. It shouldn't be hitting the GELF input at all.

The port issue got resolved. I made https://gerrit.wikimedia.org/r/#/c/348184/1 to send things over gzip (even if it's not needed, let's have it for faster I/O)

Ladsgroup edited projects, added Machine-Learning-Team (Active Tasks); removed Machine-Learning-Team.Apr 14 2017, 1:30 AM

Ladsgroup renamed this task from ORES logs not being saved to logstash to Use gzip for logstash.Apr 14 2017, 1:38 AM

Ladsgroup lowered the priority of this task from High to Medium.

Ladsgroup moved this task from Incoming to Blocked on others on the User-Ladsgroup board.

Ladsgroup moved this task from Parked to Review on the Machine-Learning-Team (Active Tasks) board.

Okay. My conclusion is there are two things:

The beta cluster used a way out-dated puppet config which got these errors Erik mentioned. It simply got resolved by updating the puppetmaster
Logs should be gzipped but that's not necessary for logging. Since we still see logs but they are not gzipped: https://logstash.wikimedia.org/app/kibana#/dashboard/ORES

Mentioned in SAL (#wikimedia-releng) [2017-04-14T08:03:42Z] <hashar> beta: resetting puppetmaster to last good tag snapshot-20170414T0030 A cherry pick for T161563 end up dropping three patches which broke other parts of the infrastructure

Mentioned in SAL (#wikimedia-releng) [2017-04-14T08:17:16Z] <hashar> beta: cherry picking again 348184/4 'service: use gzip for logging in uwsgi' for T161563

Mentioned in SAL (#wikimedia-releng) [2017-04-25T06:46:58Z] <Amir1> uncherry-pick f6ce64e99a and 225b8d4e82 (T161563)

when un-cherry picked, it works like a charm. I cherry-pick it again to see what happens.

Mentioned in SAL (#wikimedia-releng) [2017-04-27T07:26:54Z] <Amir1> cherry-picking 348184/4 (T161563)

Change 348184 abandoned by Ladsgroup:
service: use gzip for logging in uwsgi

Reason:
It breaks beta cluster. Let's not do it.

https://gerrit.wikimedia.org/r/348184

Ladsgroup closed this task as Declined.Jun 4 2017, 12:47 AM

awight moved this task from Review to Completed on the Machine-Learning-Team (Active Tasks) board.Jul 3 2017, 5:48 PM

fgiunchedi added a project: observability.Aug 19 2019, 2:32 PM

Use gzip for logstashClosed, DeclinedPublicActions

Description

Details

Related Objects

Event Timeline

Use gzip for logstash
Closed, DeclinedPublic
Actions