These are at some level quite similar to what the bulk daemon already does, we have a bunch of pre-calculated data and want to insert it into elasticsearch. In this case though instead of the data streaming in over kafka, there will be a single notification over kafka that a new batch of data is ready, and that data will be pulled in from swift and uploaded. This will initially be used for the import of glent suggestions to the prod clusters, but the existing popularity score import could also potentially be moved to this method.
This task is to track the work on the production side: import data from swift to elasticsearch.
Assuming that swift contains a list of json files to import and we are given a folder to read:
- list all files in the folder
- select the last one (there will be some naming convention)
- check that a new index needs to be created
- create a index based on the data filename using a very simple mapping
- pull data from swift using MW classes (FileBackendStore?) and import to elastic using the bulk API
- switch the index alias to the new index
- delete the old index