Page MenuHomePhabricator

Adjust mjolnir bulk_daemon to import glent swift uploads
Closed, ResolvedPublic

Description

These are at some level quite similar to what the bulk daemon already does, we have a bunch of pre-calculated data and want to insert it into elasticsearch. In this case though instead of the data streaming in over kafka, there will be a single notification over kafka that a new batch of data is ready, and that data will be pulled in from swift and uploaded. This will initially be used for the import of glent suggestions to the prod clusters, but the existing popularity score import could also potentially be moved to this method.

This task is to track the work on the production side: import data from swift to elasticsearch.
Assuming that swift contains a list of json files to import and we are given a folder to read:

  1. list all files in the folder
  2. select the last one (there will be some naming convention)
  3. check that a new index needs to be created
  4. create a index based on the data filename using a very simple mapping
  5. pull data from swift using MW classes (FileBackendStore?) and import to elastic using the bulk API
  6. switch the index alias to the new index
  7. delete the old index

Event Timeline

Change 521368 had a related patch set uploaded (by EBernhardson; owner: EBernhardson):
[search/MjoLniR@master] Import glent suggestions over swift

https://gerrit.wikimedia.org/r/521368

Change 522191 had a related patch set uploaded (by EBernhardson; owner: EBernhardson):
[search/MjoLniR@master] Introduce basic Dockerfile

https://gerrit.wikimedia.org/r/522191

Change 522190 had a related patch set uploaded (by EBernhardson; owner: EBernhardson):
[search/MjoLniR@master] Implement daemon for handling bulk import via swift

https://gerrit.wikimedia.org/r/522190

Change 522491 had a related patch set uploaded (by EBernhardson; owner: EBernhardson):
[wikimedia/discovery/analytics@master] Convert transfer_to_es to export over swift

https://gerrit.wikimedia.org/r/522491

Change 522190 merged by jenkins-bot:
[search/MjoLniR@master] Implement daemon for handling bulk import via swift

https://gerrit.wikimedia.org/r/522190

Change 521368 merged by jenkins-bot:
[search/MjoLniR@master] Import glent suggestions over swift

https://gerrit.wikimedia.org/r/521368

Change 524624 had a related patch set uploaded (by EBernhardson; owner: EBernhardson):
[labs/private@master] Add swift analytics_mjolnir dummy account key

https://gerrit.wikimedia.org/r/524624

Change 524625 had a related patch set uploaded (by EBernhardson; owner: EBernhardson):
[operations/puppet@production] Add swift read credentials for mjolnir

https://gerrit.wikimedia.org/r/524625

Change 522491 merged by jenkins-bot:
[wikimedia/discovery/analytics@master] Convert transfer_to_es to export over swift

https://gerrit.wikimedia.org/r/522491

Change 527220 had a related patch set uploaded (by EBernhardson; owner: EBernhardson):
[search/MjoLniR@master] Update bulk_daemon for changes in swift uploads

https://gerrit.wikimedia.org/r/527220

Change 528190 had a related patch set uploaded (by EBernhardson; owner: EBernhardson):
[operations/puppet@production] Change mjolnir_bulk_daemon kafka topics

https://gerrit.wikimedia.org/r/528190

Change 524625 abandoned by EBernhardson:
Add swift read credentials for mjolnir

Reason:
we went with read-only ACL's instead of per-use credentials.

https://gerrit.wikimedia.org/r/524625

Change 524624 abandoned by EBernhardson:
Add swift analytics_mjolnir dummy account key

Reason:
went with read-only ACL instead of per-use credentials

https://gerrit.wikimedia.org/r/524624

Change 528190 merged by Gehel:
[operations/puppet@production] Change mjolnir_bulk_daemon kafka topics

https://gerrit.wikimedia.org/r/528190

Change 527220 merged by jenkins-bot:
[search/MjoLniR@master] Update bulk_daemon for changes in swift uploads

https://gerrit.wikimedia.org/r/527220

Change 531285 had a related patch set uploaded (by EBernhardson; owner: EBernhardson):
[operations/puppet@production] Set REQUESTS_CA_BUNDLE for mjolnir daemons

https://gerrit.wikimedia.org/r/531285

Change 531273 had a related patch set uploaded (by EBernhardson; owner: EBernhardson):
[operations/puppet@production] Allow glent indices to auto-create in cirrus clusters

https://gerrit.wikimedia.org/r/531273

Change 531285 merged by Gehel:
[operations/puppet@production] Set REQUESTS_CA_BUNDLE for mjolnir daemons

https://gerrit.wikimedia.org/r/531285

Change 531310 had a related patch set uploaded (by EBernhardson; owner: EBernhardson):
[search/MjoLniR@master] bulk_daemon: Increase max_poll_interval_ms to 15 minutes

https://gerrit.wikimedia.org/r/531310

Change 531310 merged by jenkins-bot:
[search/MjoLniR@master] bulk_daemon: Increase max_poll_interval_ms to 15 minutes

https://gerrit.wikimedia.org/r/531310

Change 531273 merged by Gehel:
[operations/puppet@production] Allow glent indices to auto-create in cirrus clusters

https://gerrit.wikimedia.org/r/531273