As seen in T188085, some pybal messages are being omitted on the systemd journal on certain conditions. For example:
$ sudo journalctl -u pybal --since=yesterday | grep bgp Feb 22 09:04:17 lvs1006 pybal[15389]: [bgp] INFO: State is now: IDLE Feb 22 09:06:28 lvs1006 pybal[24481]: [bgp] INFO: State is now: IDLE Feb 22 09:06:30 lvs1006 pybal[26025]: [bgp] INFO: (Re)connect to 208.80.154.197 Feb 22 09:06:30 lvs1006 pybal[26025]: [bgp] INFO: State is now: CONNECT Feb 22 09:06:30 lvs1006 pybal[26025]: [bgp] INFO: State is now: CONNECT Feb 22 09:06:30 lvs1006 pybal[26025]: [bgp] INFO: State is now: OPENSENT Feb 22 09:13:12 lvs1006 pybal[26025]: [bgp] INFO: State is now: IDLE
VS
vgutierrez@lvs1006:~$ zgrep bgp /var/log/pybal.log.15.gz Feb 22 09:06:28 lvs1006 pybal[24481]: [bgp] INFO: State is now: IDLE Feb 22 09:06:30 lvs1006 pybal[26025]: [bgp] INFO: (Re)connect to 208.80.154.197 Feb 22 09:06:30 lvs1006 pybal[26025]: [bgp] INFO: State is now: CONNECT Feb 22 09:06:30 lvs1006 pybal[26025]: [bgp] INFO: State is now: CONNECT Feb 22 09:06:30 lvs1006 pybal[26025]: [bgp] INFO: State is now: OPENSENT Feb 22 09:06:30 lvs1006 pybal[26025]: [bgp] INFO: OPEN: version: 4 ASN: 14907 hold time: 30 id: 3494943429 Feb 22 09:06:30 lvs1006 pybal[26025]: [bgp] INFO: Hold time: 30 Keepalive time: 10 Feb 22 09:06:30 lvs1006 pybal[26025]: [bgp] INFO: State is now: OPENCONFIRM Feb 22 09:06:30 lvs1006 pybal[26025]: [bgp] INFO: State is now: ESTABLISHED
As @MoritzMuehlenhoff pointed out, this behaviour can be related to systemd journal rate limiting. PyBal doesn't configure rate limiting at all and lvs1006 is running with the default values:
[Journal] #Storage=auto #Compress=yes #Seal=yes #SplitMode=uid #SyncIntervalSec=5m #RateLimitInterval=30s #RateLimitBurst=1000
And from 09:06:28 to 09:06:32 (6 seconds) 854 messages were logged.
We need to set a proper rate limiting policy for pybal to avoid this kind of issue.