Page MenuHomePhabricator

Add sha512 checksum files to all the ML's models in the public dir
Closed, ResolvedPublic

Description

In https://analytics.wikimedia.org/published/wmf-ml-models/ we store the models uploaded to Lift Wing. We should add sha512 checksum to all files to allow people to verify the binaries after downloading them.

Extra caveat - we should verify if anybody is allowed to change files under that directory, to make sure that they cannot be tampered easily (or mistakenly changed etc..). For example:

  • MLOps uploads the model binary and the checksum to analytics.wikimedia.org.
  • User X with access to the stat boxes copies over their models on the same dir, overwriting the content.

Event Timeline

Do you think it would be useful to also keep the checksums in a different place, with permissions independent of the backing store behind the published/ directory?

Do you think it would be useful to also keep the checksums in a different place, with permissions independent of the backing store behind the published/ directory?

Ideally it would be nice to keep them in the same place, so we don't forget to add them with our automation and folks from the community have an easier way to find things.

Change 963956 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] profile::statistics::explorer: ensure /srv/published/wmf-ml-models

https://gerrit.wikimedia.org/r/963956

Change 963962 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] role::statistics::explorer: add ml-team-admins to stat100x nodes

https://gerrit.wikimedia.org/r/963962

Change 963962 merged by Elukey:

[operations/puppet@production] role::statistics::explorer: add ml-team-admins to stat100x nodes

https://gerrit.wikimedia.org/r/963962

Change 963956 merged by Elukey:

[operations/puppet@production] profile::statistics::explorer: ensure /srv/published/wmf-ml-models

https://gerrit.wikimedia.org/r/963956

The original upload task was T334111. All the stat boxes now have a directory called /srv/published/wmf-ml-models that can be writable only by members of the posix group ml-team-admins and root.

Ran also the following:

elukey@stat1008:~$ for f in $(find /srv/published/wmf-ml-models -type f); do echo $f; sha512sum -b $f > "${f}.sha512"; done

To generate all the sha512 files.

We decided during the team meeting to add prompt in the upload_model script to copy the model to /srv/published. In this way we don't forget :)

Next steps:

  • Implement the new prompt in the upload models scritpt.

Change 965469 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] profile::statistics::explorer:ml: expand model_upload.sh

https://gerrit.wikimedia.org/r/965469

Change 965469 merged by Elukey:

[operations/puppet@production] profile::statistics::explorer:ml: expand model_upload.sh

https://gerrit.wikimedia.org/r/965469

Change 965474 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] profile::statistics::explorer::ml: change owner of published dir

https://gerrit.wikimedia.org/r/965474

Prompt added to the script, task should be done!

Change 965474 merged by Elukey:

[operations/puppet@production] profile::statistics::explorer::ml: change owner of published dir

https://gerrit.wikimedia.org/r/965474