Let's work out the details (now updated with answers):
- How many dump runs do we keep?
- 3 for now
- Do we need to stop keeping so many of some other type of dump?
- Not at the moment
- What credentials do we need to retrieve files from AWS?
- Fixed text strings which have been provided to us and will be stored in the private puppet repo
- We won't be rsyncing because we only want one of the many daily runs that OKAPI will have available; does this mean a custom script?
- Yes and it's done-ish
- Do we want to just proxy for the given files instead? This could incur AWS costs, and would mean being clever about only serving requests for certain runs.
- No we don't.
What about the future? There will be other datasets; what will we do about space for those? That will be discussed in a future task if needed.
TODOs remaining:
- Add cleanup of tmp files to downloader script
- Add puppetization of enterprise_html/run and tmp dirs to puppet so script does not need to create them
- Add downloader script to puppet
- Add Enterprise credentials to puppet
- Run downloader script via systemd timer instead of manually -- IN PROGRESS
- Run rsync from web server to nfs server (i.e. from one labstore box to the other) of Enterprise dumps, via systemd timer
- Add cleanup of older downloads so we keep only a specified number of runs
I'll add other things if they come up, hopefully they won't.