Page MenuHomePhabricator

Swift account to store ML models
Closed, ResolvedPublic

Description

To support the ML-Team's MVP for model serving, we'd need a swift account to store models on that supports the S3 API. After a chat with @fgiunchedi, we thought to:

  1. Use Swift Thanos as interim solution for the MVP.
  2. Move ML Models to Swift MOSS when it will be ready.

If this is ok for everybody, we'd need an account for Swift Thanos to store the initial models. Long term we'll probably want a read-only account to use in Kubeflow, and a read/write to push models (still unclear how it will be done).

Event Timeline

SGTM, in practical terms the work to do involves adding the account to hieradata/common/profile/thanos/swift.yaml to puppet.git and the private bits to "public private" and the real private.git

SGTM, in practical terms the work to do involves adding the account to hieradata/common/profile/thanos/swift.yaml to puppet.git and the private bits to "public private" and the real private.git

Thanks! Quick question - what the .admin setting implies? Being able to do anything on the cluster or something less powerful? (Just trying to figure out what to create)

SGTM, in practical terms the work to do involves adding the account to hieradata/common/profile/thanos/swift.yaml to puppet.git and the private bits to "public private" and the real private.git

Thanks! Quick question - what the .admin setting implies? Being able to do anything on the cluster or something less powerful? (Just trying to figure out what to create)

Sort of, meaning the account has control over the containers in the account (but only in the account, not related to other containers in other accounts)

Ahhh perfect, thanks for explaining, I feel better now :) Going to make the changes in a few!

Change 682097 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] profile::thanos::swift: add account for ML serve cluster

https://gerrit.wikimedia.org/r/682097

Change 682125 had a related patch set uploaded (by Elukey; author: Elukey):

[labs/private@master] profile::thanos::swift: add fake credentials for mlserve_prod

https://gerrit.wikimedia.org/r/682125

Change 682125 merged by Elukey:

[labs/private@master] profile::thanos::swift: add fake credentials for mlserve_prod

https://gerrit.wikimedia.org/r/682125

Change 682097 merged by Elukey:

[operations/puppet@production] profile::thanos::swift: add account for ML serve cluster

https://gerrit.wikimedia.org/r/682097

Mentioned in SAL (#wikimedia-operations) [2021-04-23T13:33:51Z] <elukey> roll restart of all thanos-swift proxies to pick up new ML account - T280773

elukey claimed this task.
elukey@ml-serve1001:~$ cat .s3cfg 
[default]
access_key = mlserve:prod
host_base = https://thanos-swift.discovery.wmnet
host_bucket = https://thanos-swift.discovery.wmnet
secret_key = [redacted] 
signature_v2 = true

elukey@ml-serve1001:~$ s3cmd mb s3://test-elukey
Bucket 's3://test-elukey/' created

elukey@ml-serve1001:~$ s3cmd put batman.txt s3://test-elukey/batman.txt
upload: 'batman.txt' -> 's3://test-elukey/batman.txt'  [1 of 1]
 16 of 16   100% in    0s    69.05 B/s  done

elukey@ml-serve1001:~$ s3cmd ls s3://test-elukey
2021-04-23 13:53           16  s3://test-elukey/batman.txt

elukey@ml-serve1001:~$ s3cmd rb s3://test-elukey
ERROR: S3 error: 409 (BucketNotEmpty): The bucket you tried to delete is not empty

elukey@ml-serve1001:~$ s3cmd del s3://test-elukey/batman.txt
delete: 's3://test-elukey/batman.txt'

elukey@ml-serve1001:~$ s3cmd rb s3://test-elukey
Bucket 's3://test-elukey/' removed

Looks perfect in my opinion, closing!