Page MenuHomePhabricator

Add maint script to import data from swift to elasticsearch
Closed, DuplicatePublic


Swift is becoming the datastore pivot to import models generated from the analytics cluster to production (T219544).
As we need to move suggestions generated by glent (T212888) from hadoop to elasticsearch we will investigate using swift instead of kafka (as we used in the past).
This task is to track the work on the production side: import data from swift to elasticsearch.
Assuming that swift contains a list of json files to import and we are given a folder to read:

  1. list all files in the folder
  2. select the last one (there will be some naming convention)
  3. check that a new index needs to be created
  4. create a index based on the data filename using a very simple mapping
  5. pull data from swift using MW classes (FileBackendStore?) and import to elastic using the bulk API
  6. switch the index alias to the new index
  7. delete the old index