We import full-datasets into hadoop regularly, and it doesn't make sense to keep many snapshots:
- /wmf/data/raw/mediawiki/[tables|project_namepasce_map]/*/snaphsot=YYYY-MM
- /wmf/data/wmf/mediawiki/[user_history|page_history|history|metrics]/snaphsot=YYYY-MM
We should maintain 2 snaphots: 1 with current-month imported data, and another with previous month imported data. Older ones should be removed (after current month import success).