What I've done manually until now is referenced here: https://gerrit.wikimedia.org/r/c/analytics/refinery/+/409960
I have started a python script to do pretty much the same thing with more flexibility and incremental growth.
Comments/ideas welcome :)
Excuse me for butting in at this late date but these files are already available from labstore1006,7 to labs instances and on stats100? (I forget which one now). Do you need them to be available somewhere else?
I don't have any objections on principle to having another copy floating around, it would just be nice to make sure it's not redundant.
Hi @ArielGlenn, the reason for which we need the files on an-coord1001 is because it is the one machine responsible for crons/systemd-timers jobs in our infra while the other stat1005 machine is user-jobs oriented.
This task being about productionizing the import of the files, it involves the machine responsible for prod-jobs. Ok on your side?
I mean, it's fine, but maybe it's better to just provide them as is done on stat100? (5? 7?) via nfs mount from labstore1006 (7?). Looping @Bstorm in for her opinion, as she is one of the point people for those servers. I don't mean to slow this down at all, just if there's a simpler solution than copy them over, maybe we should go for it.