As per T192839, for sampled impression and landing page data we won't ingress data directly from the Kafka topic in to the database, but rather will write files from the stream and will read those, as in the legacy system.
However, the format of the new files is pretty different from the old ones. Also, the legacy python scripts that processed data in the old format are pretty crufty. So, instead of writing new code to re-create the legacy format and feed it to the crufty legacy scripts, we'll re-do the legacy scripts to read the new format.
This should make the system more maintainable and stable, so it's definitely within scope for this switchover.
We may wish to make some minor changes in the database schema, but we should ensure that queries currently used will continue to work.