$ kafkacat -C -b kafka1012.eqiad.wmnet:9092 -o beginning -t eventlogging_ChangesListHighlights > clhighlights $ (for ts in $(jq '.timestamp' < clhighlights); do date +%Y%m%d -d "@$ts"; done) | sort | uniq -c 1 20170705 21 20170706 10 20170707 8 20170708 7 20170709 48 20170710 33 20170711 24 20170712
mysql:research@s3-analytics-slave [log]> select count(*), substring(timestamp, 1, 8) as day from ChangesListHighlights_16484288 where timestamp >'20170705000000' group by day; +----------+----------+ | count(*) | day | +----------+----------+ | 15 | 20170705 | | 21 | 20170706 | | 10 | 20170707 | | 8 | 20170708 | | 7 | 20170709 | | 48 | 20170710 | | 24 | 20170711 | +----------+----------+ 7 rows in set (0.02 sec)
The discrepancy on the first day is fine, that's just because of the Kafka 7-day cutoff. The other days line up perfectly, except for the 11th (15 missing events) and the 12th (24 missing events). I took a deeper look, and up until 2017-07-11 19:53:17 UTC it appears that Kafka and MySQL contain the same events, after that MySQL started dropping almost all of them. Only three events made it through after that, at 22:04:27, 22:04:29 and 22:16:02 (all on the 11th).