Page MenuHomePhabricator

Occasional rsync race while syncing /var/lib/grafana
Closed, ResolvedPublic

Description

Noticed this today with a flapping alert, sometimes rsync will attempt to transfer the sqlite journal from grafana hosts. However the file can disappear and thus the sync fails:

Oct 22 08:10:00 grafana2001 systemd[1]: Started Transfer data periodically between hosts.
Oct 22 08:10:01 grafana2001 stunnel[29025]: LOG5[ui]: stunnel 5.50 on x86_64-pc-linux-gnu platform
Oct 22 08:10:01 grafana2001 sync-var-lib-grafana[29017]: 2021.10.22 08:10:01 LOG5[ui]: stunnel 5.50 on x86_64-pc-linux-gnu platform
Oct 22 08:10:01 grafana2001 stunnel[29025]: LOG5[ui]: Compiled with OpenSSL 1.1.1b  26 Feb 2019
Oct 22 08:10:01 grafana2001 sync-var-lib-grafana[29017]: 2021.10.22 08:10:01 LOG5[ui]: Compiled with OpenSSL 1.1.1b  26 Feb 2019
Oct 22 08:10:01 grafana2001 stunnel[29025]: LOG5[ui]: Running  with OpenSSL 1.1.1d  10 Sep 2019
Oct 22 08:10:01 grafana2001 sync-var-lib-grafana[29017]: 2021.10.22 08:10:01 LOG5[ui]: Running  with OpenSSL 1.1.1d  10 Sep 2019
Oct 22 08:10:01 grafana2001 stunnel[29025]: LOG5[ui]: Threading:PTHREAD Sockets:POLL,IPv6,SYSTEMD TLS:ENGINE,FIPS,OCSP,PSK,SNI Auth:LIBWRAP
Oct 22 08:10:01 grafana2001 sync-var-lib-grafana[29017]: 2021.10.22 08:10:01 LOG5[ui]: Threading:PTHREAD Sockets:POLL,IPv6,SYSTEMD TLS:ENGINE,FIPS,OCSP,PSK,SNI Auth:LIBWRAP
Oct 22 08:10:01 grafana2001 stunnel[29025]: LOG5[ui]: Reading configuration from file /tmp/sync-ssl-wrapper.stunnel.conf.MiVUZUd5
Oct 22 08:10:01 grafana2001 sync-var-lib-grafana[29017]: 2021.10.22 08:10:01 LOG5[ui]: Reading configuration from file /tmp/sync-ssl-wrapper.stunnel.conf.MiVUZUd5
Oct 22 08:10:01 grafana2001 stunnel[29025]: LOG5[ui]: UTF-8 byte order mark not detected
Oct 22 08:10:01 grafana2001 sync-var-lib-grafana[29017]: 2021.10.22 08:10:01 LOG5[ui]: UTF-8 byte order mark not detected
Oct 22 08:10:01 grafana2001 sync-var-lib-grafana[29017]: 2021.10.22 08:10:01 LOG5[ui]: FIPS mode disabled
Oct 22 08:10:01 grafana2001 sync-var-lib-grafana[29017]: 2021.10.22 08:10:01 LOG4[ui]: Service [stunnel] uses "verifyChain" without subject checks
Oct 22 08:10:01 grafana2001 sync-var-lib-grafana[29017]: 2021.10.22 08:10:01 LOG4[ui]: Use "checkHost" or "checkIP" to restrict trusted certificates
Oct 22 08:10:01 grafana2001 sync-var-lib-grafana[29017]: 2021.10.22 08:10:01 LOG5[ui]: Configuration successful
Oct 22 08:10:01 grafana2001 sync-var-lib-grafana[29017]: 2021.10.22 08:10:01 LOG5[0]: Service [stunnel] accepted connection from unnamed socket
Oct 22 08:10:01 grafana2001 stunnel[29025]: LOG5[ui]: FIPS mode disabled
Oct 22 08:10:01 grafana2001 stunnel[29025]: LOG4[ui]: Service [stunnel] uses "verifyChain" without subject checks
Oct 22 08:10:01 grafana2001 stunnel[29025]: LOG4[ui]: Use "checkHost" or "checkIP" to restrict trusted certificates
Oct 22 08:10:01 grafana2001 stunnel[29025]: LOG5[ui]: Configuration successful
Oct 22 08:10:01 grafana2001 stunnel[29025]: LOG5[0]: Service [stunnel] accepted connection from unnamed socket
Oct 22 08:10:01 grafana2001 sync-var-lib-grafana[29017]: 2021.10.22 08:10:01 LOG5[0]: s_connect: connected 2620:0:861:101:10:64:0:119:1873
Oct 22 08:10:01 grafana2001 sync-var-lib-grafana[29017]: 2021.10.22 08:10:01 LOG5[0]: Service [stunnel] connected remote server from 2620:0:860:101:10:192:0:160:54052
Oct 22 08:10:01 grafana2001 stunnel[29025]: LOG5[0]: s_connect: connected 2620:0:861:101:10:64:0:119:1873
Oct 22 08:10:01 grafana2001 stunnel[29025]: LOG5[0]: Service [stunnel] connected remote server from 2620:0:860:101:10:192:0:160:54052
Oct 22 08:10:02 grafana2001 stunnel[29025]: LOG5[0]: Certificate accepted at depth=0: CN=grafana1002.eqiad.wmnet
Oct 22 08:10:02 grafana2001 sync-var-lib-grafana[29017]: 2021.10.22 08:10:02 LOG5[0]: Certificate accepted at depth=0: CN=grafana1002.eqiad.wmnet
Oct 22 08:10:03 grafana2001 sync-var-lib-grafana[29017]: file has vanished: "/grafana.db-journal" (in var-lib-grafana)
Oct 22 08:10:03 grafana2001 stunnel[29025]: LOG5[0]: Connection closed: 1093 byte(s) sent to TLS, 852266 byte(s) sent to socket
Oct 22 08:10:03 grafana2001 sync-var-lib-grafana[29017]: 2021.10.22 08:10:03 LOG5[0]: Connection closed: 1093 byte(s) sent to TLS, 852266 byte(s) sent to socket
Oct 22 08:10:03 grafana2001 sync-var-lib-grafana[29017]: rsync warning: some files vanished before they could be transferred (code 24) at main.c(1677) [generator=3.1.3]
Oct 22 08:10:03 grafana2001 systemd[1]: rsync-var-lib-grafana.service: Main process exited, code=exited, status=24/n/a
Oct 22 08:10:03 grafana2001 systemd[1]: rsync-var-lib-grafana.service: Failed with result 'exit-code'.

I think a possible fix would is to be able to tweak the file selection (i.e. to exclude the journal in this case). What do you think @Dzahn ?

Event Timeline

Change 733083 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] rsync::quickdatacopy: add option to exclude some files

https://gerrit.wikimedia.org/r/733083

@fgiunchedi Yea, I think we can add a parameter to just pass through to rsync's --exclude parameter and then use that to ignore the file. Patch uploaded :)

Dzahn triaged this task as Medium priority.Oct 22 2021, 10:09 PM

Change 733083 merged by Dzahn:

[operations/puppet@production] rsync::quickdatacopy: add option to exclude some files

https://gerrit.wikimedia.org/r/733083

Change 734408 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] grafana::production: exclude grafana.db-journal from rsync

https://gerrit.wikimedia.org/r/734408

Change 734408 merged by Filippo Giunchedi:

[operations/puppet@production] grafana::production: exclude grafana.db-journal from rsync

https://gerrit.wikimedia.org/r/734408

fgiunchedi claimed this task.

This is complete thanks to the help from @Dzahn !