Create an hourly Airflow DAG to transfer pageview partitions from WMF HDFS to WME S3.
Credentials via IAM Anywhere (no static AWS keys)
Streaming transfer via hdfs dfs -cat + boto3.upload_fileobj
S3 writes use AES256 server-side encryption
Egress via url-downloader.eqiad.wikimedia.org:8080
Deterministic S3 keys — retries safely overwrite
Acceptance Criteria
DAG runs successfully for interval
All 4 partitions transferred and validated (Content-Length > 0)
Retries produce correct output
No static AWS credentials used
Image
docker-registry.wikimedia.org/repos/wme/pageviews-hdfs-transfer:v0.1.1