Let's work out the details (now updated with answers):
- How many dump runs do we keep?
-- 3 for now
- Do we need to stop keeping so many of some other type of dump?
-- Not at the moment
- What credentials do we need to retrieve files from AWS?
-- Fixed text strings which have been provided to us and will be stored in the private puppet repo
- We won't be rsyncing because we only want one of the many daily runs that OKAPI will have available; does this mean a custom script?
-- Yes and it's done-ish
- Do we want to just proxy for the given files instead? This could incur AWS costs, and would mean being clever about only serving requests for certain runs.
-- No we don't.
What about the future? There will be other datasets; what will we do about space for those? That will be discussed in a future task if needed.
TODOs remaining:
[] Add cleanup of tmp files to downloader script
[] Add puppetization of enterprise_html/run and tmp dirs to puppet so script does not need to create them
[] Add downloader script to puppet
[] Run downloader script via systemd timer instead of manually
[] Run rsync from web server to nfs server (i.e. from one labstore box to the other) of Enterprise dumps, via systemd timer
I'll add other things if they come up, hopefully they won't.