Page MenuHomePhabricator

Create Swift account for readonly access to ML models
Closed, ResolvedPublic

Description

Similar to T280773, we'd like to add an account for readonly access to the ML models stored on Thanos Swift (and in the future, the MOSS cluster). The existing account/user (mlserve:prod) would continue to be used to upload models to the storage bucket.

Naming is a bit tricky (also because I don't know conventions used here all that well. One option would be mlserve:readonly, and we'd clean up the mlserve:prod name into something more useful/accurate later.

On the puppet side, changes 682097 and 682125 plus one for the actually-private repo would follow once we have hashed out the name. One question here is what values access can have to make an account have read-only access.

Event Timeline

@MatthewVernon hi! Do you have any guidance about how to proceed?

Change 810840 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] profile::thanos::swift: add a read only account for ml-serve

https://gerrit.wikimedia.org/r/810840

Change 810926 had a related patch set uploaded (by Elukey; author: Elukey):

[labs/private@master] profile::thanos::swift: add mlserve_ro account

https://gerrit.wikimedia.org/r/810926

Change 810926 merged by Elukey:

[labs/private@master] profile::thanos::swift: add mlserve_ro account

https://gerrit.wikimedia.org/r/810926

Change 810840 merged by Elukey:

[operations/puppet@production] profile::thanos::swift: add a read only account for ml-serve

https://gerrit.wikimedia.org/r/810840

Mentioned in SAL (#wikimedia-operations) [2022-07-04T14:19:43Z] <elukey> roll restart of thanos-fe's proxy to pick up a new account - T311628

The new mlserve:ro account has been added, but if I try to use s3cmd with the new credentials I get an error:

elukey@stat1004:~$ sudo s3cmd -c test.cfg get s3://wmf-ml-models/goodfaith/enwiki/20220214192144/model.bin --force 
download: 's3://wmf-ml-models/goodfaith/enwiki/20220214192144/model.bin' -> './model.bin'  [1 of 1]
ERROR: S3 error: 500 (InternalError): unexpected status code 409

In the logs the 409 seems originated by a 403, so the new account is not able to read anything.

Filippo applied the following rule and everything now works:

swift post wmf-ml-models+segments -r 'mlserve:ro'

I had previously only added:

swift post wmf-ml-models -r "mlserve:ro"

Tried to upload a model with the new read only account and I got access denied (good):

elukey@stat1004:~$ sudo s3cmd -c test.cfg put model.bin s3://wmf-ml-models/test/model.bin
upload: 'model.bin' -> 's3://wmf-ml-models/test/model.bin'  [1 of 1]
  2752512 of 10351347    26% in    0s     3.51 MB/s  failed
  2752512 of 10351347    26% in    1s     2.51 MB/s  done
ERROR: S3 error: 403 (AccessDenied): Access Denied.
elukey claimed this task.

All working! Added the new account to the ML staging cluster, and it worked nicely. We'll move away from the admin account in prod as well. Thanks!