- make a list of wikis we're processing
- for each wiki, sqoop data for each important table
- test a run and see if it needs to be throttled at all. (it took about 4 hours to sqoop enwiki)
- code to be run ad-hoc, not on a cron quite yet
Pointing includes testing, likely will take 2 weeks.