Correctly collect logs from php-fpm pools
Open, NormalPublic

Description

We decided to go the same way as we went with HHVM, thus relaying the logs to syslog to then deliver wherever desired.

To this aim, we added error_log = syslog and syslog.ident = php7.2-fpm to the configuration, and the corresponding rsyslog rules.

We found out, however, that whatever gets logged from the fpm pools ends up having an identifier of ool $pool_name.

This is an unfortunate bug that has been reported multiple times. As a consequence, the broken programname containing a space is invalid as far as the syslog format is concerned, and rsyslog correctly refuses to parse it. So the programname is always "ool" as far as rsyslog is concerned

There are several ways to fix this, but none is exactly attactive:

  • Just add rules to catch "ool" as a programname. The log line will still be badly broken, and will be badly parsed once imported in logstash[1]. Also, we will have to repeat all the processing rules twice.
  • Add an openlog() function in the auto_prepend_file. That would set the correct ident for all requests. We might need to closelog() too, though.
  • Add catch_workers_output = yes to the pool configuration, with some clever setting of error_log to send errors to stderr might do what we want (let journald collect the log lines from stderr, mark them with the correct programname (the unit name), and relay them to syslog. Sadly, catch_workers_output is known to cause a severe performance degradation at high concurrency, so it's not an option.
  • Patch php-fpm so that when it sets the programname in syslog it does something sensible like setting it to php-pool-$pool_name
  • Configure php-fpm to send the logs to a specialized syslog facility and filter for it. This will reduce duplication of rules but still possibly have the broken parsing issue.

As it stands, I would prefer to use the openlog() approach, which seems the cleanest way to handle this.

[1] the message registered in logstash becomes ool www[12345]: JOE TEST instead of JOE TEST

Joe created this task.Wed, Dec 5, 8:59 AM
Joe triaged this task as Normal priority.
Restricted Application removed a project: Patch-For-Review. · View Herald TranscriptWed, Dec 5, 8:59 AM

Mentioned in SAL (#wikimedia-operations) [2018-12-05T09:00:10Z] <_joe_> disabed puppet on mw1261, used for logging tests for T211184

Joe claimed this task.Wed, Dec 5, 9:02 AM

I took a quick look at this as well and indeed openlog() seems the simplest way. Also because altering programname in rsyslog isn't allowed, thus to fix this on the rsyslog side we'd have to use a different template for example, not really worth it IMO. Plus the bug is supposedly fixed in php 7.3 anyways.

FTR I tried strace on rsyslog upon receiving a test message from error_log:

[pid 32306] <... poll resumed> )        = 1 ([{fd=3, revents=POLLIN}])
[pid 32306] recvmsg(3, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="<29>Dec  5 14:00:25 ool www[27112]: JOE TEST PLEASE DISREGARD", iov_len=8096}], msg_iovlen=1, msg_control=[{cmsg_len=32, cmsg_level=SOL_SOCKET, cmsg_type=0x1d /* SCM_??? */}, {cmsg_len=28, cmsg_level=SOL_SOCKET, cmsg_type=SCM_CREDENTIALS, cmsg_data={pid=27112, uid=33, gid=33}}], msg_controllen=64, msg_flags=0}, MSG_DONTWAIT) = 61
[pid 32306] futex(0x55993f3d9f1c, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x55993f3d9f18, FUTEX_OP_SET<<28|0<<12|FUTEX_OP_CMP_GT<<24|0x1) = 1
[pid 32309] <... futex resumed> )       = 0
[pid 32306] poll([{fd=3, events=POLLIN}], 1, -1 <unfinished ...>
[pid 32309] futex(0x55993f3bd8d0, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 32309] write(15, "Dec  5 14:00:25 mw1261 ool www[27112]: JOE TEST PLEASE DISREGARD\n", 65) = 65
[pid 32309] recvfrom(8, 0x7f90a97b27a7, 1, MSG_PEEK|MSG_DONTWAIT, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[pid 32309] sendmsg(8, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\27\3\3\0n\0\0\0\0\0\0\0.\301\"\3\206\20'\26O\31)f\201_\221\221\333r\304Uq\6\277\347L\r\34\307\260b\324Oe-\250\205#\333\305~G\336\355)P`N\307L\252 Y\246\2244\334\2^\244\355:\263\236\367\36\213\333{\372\207B\205\214\220\6\313\205\225p\22L\261\331\344\22`\336\251*bC<\33\36\346\270Y\217\26\234\376\31\231", iov_len=115}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 115
[pid 32309] recvfrom(9, 0x7f90a97b27a7, 1, MSG_PEEK|MSG_DONTWAIT, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[pid 32309] sendmsg(9, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\27\3\3\0n\0\0\0\0\0\0\0.\311\256)\34K\364\33\223T\302\220jhY4\\t\304\37413\34?gI\371F\211\230x\201\3351\203<\340\3{\204$\313\3174P\1\326lb\0016\371B~%r,T\362Z\241\232\316u-\257\316\305ga\264\216\372n\233\210\n\232\3304\373\335\365qN\tu\341a\36\1\310\366\230\331\307\202\204\360\302\304\317x", iov_len=115}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 115
[pid 32309] sendto(13, "ool Dec  5 14:00:25 mw1261:  www[27112]: JOE TEST PLEASE DISREGARD\n", 67, 0, {sa_family=AF_INET, sin_port=htons(8420), sin_addr=inet_addr("10.64.32.175")}, 16) = 67
[pid 32309] sendto(14, "<29>Dec  5 14:00:25 mw1261 ool www[27112]: JOE TEST PLEASE DISREGARD", 68, 0, {sa_family=AF_INET, sin_port=htons(10514), sin_addr=inet_addr("10.2.2.36")}, 16) = 68
[pid 32309] futex(0x55993f3d9f1c, FUTEX_WAIT_PRIVATE, 329, NULL

journald sees the messages too of course but is able to associate them with their originating unit:

# journalctl -u php7.2-fpm.service -n 3
-- Logs begin at Sat 2018-12-01 16:45:57 UTC, end at Wed 2018-12-05 14:05:54 UTC. --
Dec 05 13:59:45 mw1261 php-fpm7.2[27110]: ool www[27110]: JOE TEST PLEASE DISREGARD
Dec 05 14:00:16 mw1261 php-fpm7.2[27107]: ool www[27107]: JOE TEST PLEASE DISREGARD
Dec 05 14:00:25 mw1261 php-fpm7.2[27112]: ool www[27112]: JOE TEST PLEASE DISREGARD

Change 478021 had a related patch set uploaded (by Giuseppe Lavagetto; owner: Giuseppe Lavagetto):
[operations/mediawiki-config@master] Hotfix for logging in php-fpm

https://gerrit.wikimedia.org/r/478021

Change 478021 merged by jenkins-bot:
[operations/mediawiki-config@master] Hotfix for logging in php-fpm

https://gerrit.wikimedia.org/r/478021