Page MenuHomePhabricator

librenms.syslog table size
Closed, ResolvedPublic

Description

librenms database has a syslog table which is now way over 200GB:

-rw-rw---- 1 mysql mysql  254G Oct 20 07:24 syslog.ibd

Does it really make sense to store syslog data on a relational database? If this is a hard requirement for any reason, we really need to clean this table up periodically, this size is way too much.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

@andrea.denisse This is something I mentioned to you some time ago and promised to raise it with your team.

Haha yeah indeed!

In theory we should only keep 90 days of logs : https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production/modules/profile/manifests/librenms.pp#50
https://docs.librenms.org/Extensions/Syslog/#syslog-clean-up
So either the cleanup is not running/working, or 90 days of logs is 250+ G which would be surprising, but in that case we can reduce it in multiple ways.

This is the oldest row:

root@db1164.eqiad.wmnet[librenms]> select timestamp from syslog order by timestamp asc limit 1;
+---------------------+
| timestamp           |
+---------------------+
| 2023-07-22 08:06:10 |
+---------------------+
1 row in set (0.007 sec)

Would it be possible to have it on filesystem/kibana only? I don't mind backing it up for persistence, but on db there is extra cost that wouldn't be on filesystem/backups.

We already have it in Kibana, but the LibreNMS UI is quite convenient and we send more verbose logs for alerting there. The solution is probably to reduce the retention period. Would 15 days work out DB wise ?

FYI the cleanup logs are in /var/log/librenms/daily.log

Change 968999 had a related patch set uploaded (by Ayounsi; author: Ayounsi):

[operations/puppet@production] Reduce LibreNMS syslog retention to 15 days

https://gerrit.wikimedia.org/r/968999

Change 968999 merged by Ayounsi:

[operations/puppet@production] Reduce LibreNMS syslog retention to 15 days

https://gerrit.wikimedia.org/r/968999

Marostegui closed this task as Resolved.EditedOct 26 2023, 9:19 AM
Marostegui claimed this task.
Marostegui added a project: DBA.

Table truncated:

root@db1164:/srv/sqldata/librenms# ls -lh syslog.ibd
-rw-rw---- 1 mysql mysql 9.0M Oct 26 09:18 syslog.ibd

Let's see how it grows after the merged patch

I think we are gong to need to tweak this a bit more:

-rw-rw---- 1 mysql mysql   61G Nov 14 12:44 syslog.ibd

61GB is quite large for what this is, there's no other way to send this info to a more appropriate storage systems I guess?

FWIW, the rows are almost all like this:

+-----------+----------+----------+-------+------------+---------------------+---------+-----------------------------------------------------------------------+-----------+
| device_id | facility | priority | level | tag        | timestamp           | program | msg                                                                   | seq       |                               d+-----------+----------+----------+-------+------------+---------------------+---------+-----------------------------------------------------------------------+-----------+
|       278 | user     | debug    | debug | ifinfo[876 | 2023-11-14 23:59:59 | IFINFO  | PVIDB: Attribute 'global.ipv6_stats_rate_supported' not present in Db | 222948242 |
|       278 | user     | debug    | debug | ifinfo[876 | 2023-11-14 23:59:59 | IFINFO  | PVIDB: Attribute 'global.ipv6_stats_not_supported' not present in Db  | 222948241 |
|       278 | user     | debug    | debug | ifinfo[876 | 2023-11-14 23:59:59 | IFINFO  | PVIDB: Attribute 'global.bridge_iff_mtu_supported' not present in Db  | 222948240 |
|       278 | user     | debug    | debug | ifinfo[876 | 2023-11-14 23:59:59 | IFINFO  | PVIDB: Attribute 'global.ipv6_stats_not_supported' not present in Db  | 222948238 |
|       278 | user     | debug    | debug | ifinfo[876 | 2023-11-14 23:59:59 | IFINFO  | PVIDB: Attribute 'global.ipv6_stats_not_supported' not present in Db  | 222948235 |
+-----------+----------+----------+-------+------------+---------------------+---------+-----------------------------------------------------------------------+-----------+
5 rows in set (0.001 sec)

(different devices), maybe fix the missing attributes or make it not log debug to db?

Change 974472 had a related patch set uploaded (by Ayounsi; author: Ayounsi):

[operations/homer/public@master] Don't send debug logs to LibreNMS

https://gerrit.wikimedia.org/r/974472

Thanks @Ladsgroup yeah some devices got way too verbose at sending debug logs and we don't use debug level logs for alerting so the above patch should make it significantly more quieter.

https://supportportal.juniper.net/s/article/Syslog-message-ifinfo-PVIDB-Attribute-xxxxx-not-present-in-Db?language=en_US

Change 974472 merged by Ayounsi:

[operations/homer/public@master] Don't send debug logs to LibreNMS

https://gerrit.wikimedia.org/r/974472

Thanks for the patch! I hope it'll make a dent, I'll monitor it.

While I was monitoring it, I tried this:

root@db1217.eqiad.wmnet[librenms]> select * from syslog order by timestamp desc limit 5;
+-----------+----------+----------+-------+------------+---------------------+---------+-------------------------------------------------------------------+-----------+
| device_id | facility | priority | level | tag        | timestamp           | program | msg                                                               | seq       |
+-----------+----------+----------+-------+------------+---------------------+---------+-------------------------------------------------------------------+-----------+
|       184 | daemon   | debug    | debug | mib2d[2317 | 2023-11-15 23:59:59 | MIB2D   | No event subscription for ipv6-nd6-change                         | 235092387 |
|       184 | daemon   | debug    | debug | mib2d[2317 | 2023-11-15 23:59:59 | MIB2D   | getEvents: Event ipv6-nd6-change not subscribed                   | 235092386 |
|       184 | daemon   | debug    | debug | mib2d[2317 | 2023-11-15 23:59:59 | MIB2D   | No event subscription for nd6-change                              | 235092385 |
|       184 | daemon   | debug    | debug | mib2d[2317 | 2023-11-15 23:59:59 | MIB2D   | getEvents: Event nd6-change not subscribed                        | 235092384 |
|       278 | local7   | info     | info  | mgd[11877] | 2023-11-15 23:59:59 | MGD     | UI_CHILD_STATUS: Cleanup child '/sbin/ifinfo', PID 3160, status 0 | 235092383 |
+-----------+----------+----------+-------+------------+---------------------+---------+-------------------------------------------------------------------+-----------+
5 rows in set (0.001 sec)

The thing is that timestamp is in the future. Maybe it's eqsin's TZ? nbd, just something weird to flag. Maybe it just logs everything by TS of EOD *shrugs*

For future, right now, number of logs are

root@db1217.eqiad.wmnet[librenms]> select count(*) from syslog;
+-----------+
| count(*)  |
+-----------+
| 187460821 |
+-----------+
1 row in set (44.371 sec)

Per my chat with Arzhel in irc, table truncated!

root@db1119.eqiad.wmnet[librenms]> truncate table syslog;
Query OK, 0 rows affected (0.008 sec)

root@db1119.eqiad.wmnet[librenms]> select count(*) from syslog;
+----------+
| count(*) |
+----------+
|     1463 |
+----------+
1 row in set (0.003 sec)

let's see how it goes!