yesterday while investigating cassandra/restbase issues in the test cluster I noticed restbase sends all logs to logstash
I'd expect at the very least errors or abnormal conditions to be logged locally (either files or syslog) for auditing/debugging purposes
Description
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | • mobrovac | T112648 enable restbase syslog/file logging | |||
Resolved | • Pchelolo | T113713 Allow logs to be sent to rsyslog | |||
Resolved | fgiunchedi | T113714 Separate /var on restbase | |||
Resolved | fgiunchedi | T121575 Expand SSD space in Cassandra cluster | |||
Unknown Object (Task) | |||||
Unknown Object (Task) | |||||
Unknown Object (Task) | |||||
Resolved | fgiunchedi | T127333 install SSDs in restbase2001-restbase2006 | |||
Unknown Object (Task) | |||||
Resolved | RobH | T125842 normalize eqiad restbase cluster - replace restbase1001-1006 | |||
Unknown Object (Task) | |||||
Unknown Object (Task) | |||||
Resolved | RobH | T126626 3x additional SSD for restbase hp hardware | |||
Unknown Object (Task) | |||||
Resolved | Cmjohnson | T128107 install restbase1010-restbase1015 | |||
Resolved | fgiunchedi | T127951 expand raid0 in restbase200[1-6] |
Event Timeline
We have had this discussion a few times now. The basic issue of local logging being blocking (and thus running the risk of taking out a service when the disk fills up) has not been resolved.
We can always enable local logging temporarily for debugging, but I don't think that we should risk taking out important services like RESTBase in normal production use.
Example logging stanza for (temporary) local logging:
logging: name: restbase streams: # level can be trace, info, warn, error - level: info path: /tmp/debug.log
For more options, see https://github.com/trentm/node-bunyan#stream-type-file.
Edit: Now documented at https://wikitech.wikimedia.org/wiki/RESTBase#Debugging as well.
Is this a matter of principle (i.e. disk logging in general), or some specific limitation in our tool chain? Safe, non-blocking logging seems like ought to be a solvable problem.
It is solvable, for example by logging to syslog using udp with https://github.com/mcavage/node-bunyan-syslog. The vast majority of write-to-file methods however aren't safe against disks filling up. They will either block outright, or lead to OOM when async writes never complete.
Previous incident: https://wikitech.wikimedia.org/wiki/Incident_documentation/20140211-Parsoid
Yup, same thing I had in mind. rsyslog is automatically set up on our boxes to listen for UDP packets, so this should be good enough.
The vast majority of write-to-file methods however aren't safe against disks filling up.
This is the part I'm worried about. Specifically, rsyslog writes its logs in the /var partition, which happens to be the same one used by Cassandra.
Hm, so I think I'm saying here that we might have a larger problem on our hands. Sure, RESTBase logs would contribute to that, but we laready have all these other logs going to the same place which might fill up the /var partition.
summing up yesterdays ops/services sync up:
- we'll take the multi-instance opportunity to separate /var for logs/data [ops]
- we'll investigate syslog, both local and remote for restbase [services]
does that sound accurate?
Tracked in T113714: Separate /var on restbase.
- we'll investigate syslog, both local and remote for restbase [services]
I think we have to proceed on this. Right now, without local logging, if something breaks with Logstash (see T158602) we're losing all the logs completely, which is very not good. For example, in the latest 24 hours there were some spikes in 500 errors in RESTBase which I would really like to look at, but simultaneously logstash logging is somehow broken, so we've lost some potentially important info.
It seems like all the previous concerns raised here were resolved, and we can safely add async logging through syslog-udp, maybe with a more conservative log-level, error/fatal-only for example.
Or we might even consider normal file logging with aggressive log rotation, since it's enabled in all the other services and we haven't seen any issues with it.
So, we've discussed this on the team meeting and decided to move forward on this.
The final question is whether to use syslog-over-udp or normal file logging? We didn't have any problems with file logging in other services, so I'm not quite sure why should we go with syslog for RESTBase specifically?
@Pchelolo, logging directly to a file is synchronous, which is bad for performance & can cause outages. See the earlier discussion for an example of such an outage.
UDP avoids this, so I'd propose to use that if at all possible.
Em, writing to a file is async in node by default and bunyan file stream uses fs.createWriteStream which is async, isn't it?
As I understand the incident was caused by Parsoid using logging to stderr/stdout that were redirected to a file, which are indeed blocking. File logging is non-blocking.
@Pchelolo, in the disk-full situation, writing asynchronously to files might still cause memory to build up & the service to run out of memory. It would be good to rule that out, either by testing, or by using UDP reporting.
I've done some testing on what's happening if the partition where we log fills up. I've created a VM with a 100 meg partition and started trace-logging there. When the partition fills up, an unhandled exception is raised and a worker is bing killed. This makes the master process attempt to log that a worker has died, and the master is also being killed.
Of course, a filled-up partition is a catastrophic event, but I think in the logging we might be able to survive that, at least don't die instantly and definitely don't kill the master. It is possible to catch the error, however, if we just simply ignore it, we get a huge memory leak on write requests to FS, so we not only need to catch it, but also destroy the file writing stream in the event of full disk
@Pchelolo awesome! Thank you for testing. Could you also test the scenario where logs are sent to syslog via udp and then fill the partition again? So that we can be informed of the outcomes, make a decision and proceed accordingly.
@Pchelolo awesome! Thank you for testing. Could you also test the scenario where logs are sent to syslog via udp and then fill the partition again? So that we can be informed of the outcomes, make a decision and proceed accordingly.
Sure thing, I didn't create a whole VM for just a single test.
Anyway, I'm preparing a solution for file logging issue, I think that should be fixed anyway, most services do use file logging, and we don't want the whole SCB crash if one service fills up the partition.
\o/ This is splendid! Thanks a ton!
most services do use file logging, and we don't want the whole SCB crash if one service fills up the partition.
Yup, precisely why I want us to explore the possibilities. One RB host going down because of a filled-up partition makes less of an impact than one SCB dooming in the same way.
So, I've tested the syslog-over-udp solution too, and as we've expected, the service doesn't even notice the filled-up partition.
Also, create a PR that's fixing up file logging issue: https://github.com/wikimedia/service-runner/pull/156
So, the question is whether we want syslog-over-udp with some custom rules to filter our the logs for rest base to a specific location, or fixed and improved file logging is good enough?
I flagged the possibility of other write errors (apart from out-of-space) on the PR. I really think we should err on the side of caution, and rule out running out of memory from failed logging in all reasonable cases.
Change 339501 had a related patch set uploaded (by Ppchelko):
Enable local logginng for RESTBase
Change 339501 merged by Filippo Giunchedi:
[operations/puppet] Enable local logging for RESTBase
Something is wrong there. RB is not even creating the file, despite the fact that the directory permissions are correct:
mobrovac@restbase1009:~$ ls -alhF /var/log/restbase/ total 12K drwxr-xr-x 2 restbase root 4.0K Mar 7 18:03 ./ drwxr-xr-x 19 root root 4.0K Mar 7 06:25 ../ -rw-r--r-- 1 restbase root 1.1K Mar 7 18:38 syslog.log
The file creation does not seem to be the problem here; I manually touched main.log as the restbase user and restarted the service, but it is still not writing to it:
root@restbase1009:~# ls -alhF /var/log/restbase/ total 12K drwxr-xr-x 2 restbase root 4.0K Mar 7 18:53 ./ drwxr-xr-x 19 root root 4.0K Mar 7 06:25 ../ -rw-r--r-- 1 restbase restbase 0 Mar 7 18:53 main.log -rw-r--r-- 1 restbase root 2.3K Mar 7 18:53 syslog.log
Ok, it turns out the problem is that firejail doesn't have /var/log/restbase whitelisted. RB is actually logging stuff:
root@restbase1009:~# firejail --list 22774:restbase:/usr/bin/firejail --blacklist=/root --blacklist=/home --caps --seccomp /usr/bin/nodejs restbase/server.js -c /etc/restbase/config.yaml root@restbase1009:~# firejail --join=22774 Switching to pid 22775, the first child process inside the sandbox Child process initialized root@restbase1009:~# ls -alhF /var/log/restbase total 72K drwxr-xr-x 2 restbase root 60 Mar 7 18:53 ./ drwxr-xr-x 19 root root 420 Mar 7 18:53 ../ -rw-r--r-- 1 restbase restbase 72K Mar 7 20:13 main.log
Ok, after losing half of a day on this, I realised that using /var/log is not going to fly with firejail. It explicitly [mounts an empty /var/log](https://github.com/netblue30/firejail/blob/0.9.44.6/src/firejail/fs_var.c#L110-L148) and never exposes it.
Hence, we are left with these options:
- use /srv/log
- use an alternative location that is not on /srv
- no local logging
The last one doesn't seem like an option, really. I could go for (1) as long as we have proper log rotation and removal in place. Option (2) may sound appealing, but randomly scattering the log files over the disk does not sound future-proof to me at all.
Change 342103 had a related patch set uploaded (by Mobrovac):
[operations/puppet] RESTBase: Send the logs locally to stdout/syslog
Change 342103 abandoned by Mobrovac:
RESTBase: Send the logs locally to stdout/syslog
Reason:
Rendered obsolete by I88a7bb4a0bb686cf78979dfca7a7da0a50c82c73