We've set up a new kafkatee collector host at codfw, frban2001, which will replace the aged alnitak once we're confident everything is working properly on Debian Buster etc. In the meantime it's useful for comparison, to help rule out losses in the kafka->kafkatee->logfile segment in the pipeline. I ran all of yesterdays new-pipeline landingpages json logs through a simple perl json decoder and observed that alnitak and frban2001 collected exactly the same data for 12/11. The script I used is at mintaka:/tmp/json-log-filter, and frban2001's log store is mounted at mintaka:/mnt/banner_logs_new. The beacon-impressions logs are harder to compare since they're sampled at 1:10, and the two hosts are expected to collect different messages. So we set frban1001 to collecting beacon-impressions 1:1 for a day, and at the end we'll compare what was collected to what's in Hive.
|Open||None||T183978 [Epic] Fundraising kafkatee changes|
|Open||None||T242022 Verify no losses in kafka->kafkatee->logfile data pipeline segment|
Please note that as of this AM I switched us to frban2001 and moved alnitak 'off to the side' pending decom. The 1:1 logs we collected as described above are now accessible via mintaka:/srv/archive/banner_logs/2019-testing.