Page MenuHomePhabricator

Fix sqoop script to use timestamp limits in `--boundary-query` queries
Closed, ResolvedPublic

Description

When time-boundaries are provided to the sqoop script, we should apply them to the --boundary-query parameter:
--boundary-query "SELECT MIN(cuc_id), MAX(cuc_id) FROM cu_changes WHERE cuc_timestamp >= '20210401000000' and cuc_timestamp < '20210501000000'"
otherwise the boundary query provides wrong boundaries and data ends up skewed.