Closing this task since it didn't get picked up after three years.
Thu, Jul 2
faulkner database is removed everywhere except frdb1003 (fr analytics db) and the archived backup
Wed, Jul 1
We've improved kafkatee monitoring significantly since this task was created. We are now catching cases where kafkatee does not properly connect to a broker, stops logging its stats, or fails to write content to log files. We've also got detailed metrics collecting to fundraising prometheus/grafana which could be used to alert on error counters or similar, once we actually see the type of data loss hypothesized and observe how it plays out in the gathered metrics.
Mon, Jun 29
Refreshed CA cert from puppetmaster1001:/srv/private/modules/secret/secrets/certificates/kafka_fundraising_client which has fixed the problem.
Fri, Jun 26
Thu, Jun 25
Wed, Jun 24
@Ejegg how significant is it that civi1001:/var/spool/audit vs civi2001:/var/spool/audit are not in sync? Right now synchronizing those before/after audit job runs is a totally manual and there's no mechanism or procedure to deal with it.
IIRC audit processing (or maybe just orphan-slaying?) also uses logs that we push over from the central logger, which is done by archive_sync on frlog*, but we hadn't set this up yet for civi2001. This is fixed.
Tue, Jun 23
@Ejegg ok we're set to use the payments-staging frdeploy project for this, we just need to set it to the right branch and make the config changes for 1.35.