Page MenuHomePhabricator

Toolforge: consider introducing some semantics for persistent storage
Open, Needs TriagePublic

Description

During Wikimedia-Hackathon-2023 it was mentioned a couple of times how convenient it would be to have something similar to cinder volumes but for toolforge tools.

We could have a command line interface to create/manage different storage kinds. This has been previously suggested, some kind of toolforge storage [..] interface, with options to manage different backends:

  • Ceph RBD volumes (similar to openstack cinder)
  • NFS-like volumes (for filesharing)
  • scratch-like storage (disposable)
  • some kind of backups, TBD
  • etc

With particular semantics that could be like:

  • Read only storages for retrieval of public data (dumps, mediawiki), this would be parallel to the replicas as information sources.
  • Read/write persistent shared storage (current scratch), this would be for sharing between users/projects.
  • Read/write persistent private storage (current project home), this would be for your own tool, private
  • Read/write ephemeral private storage (current /tmp, home project), this would be for temporary big file processing and such, I think currently people just use the home directory.

Event Timeline

See comments from https://gerrit.wikimedia.org/r/c/operations/software/tools-webservice/+/920259 also:

A bit of a rambling, but I see the storage split in different groups/use-cases.

  1. Read only storages for retrieval of public data (dumps, mediawiki), this would be parallel to the replicas as information sources.
  2. Read/write persistent shared storage (current scratch), this would be for sharing between users/projects.
  3. Read/write persistent private storage (current project home), this would be for your own tool, private
  4. Read/write ephemeral private storage (current /tmp, home project), this would be for temporary big file processing and such, I think currently people just use the home directory.

Each could be implemented in a different way (probably the best, as they are quite different), but we might want to create different abstractions on top of them so we can evolve them differently.

That might eventually come to have something like (not saying to implement now though):

--data-services-storage=enable // to enable mounting dumps/mediawiki/..., #1
--shared-storage=enable // this would be #2
--project-data=enable  // to enable mounting the project data directory, #3
--ephemeral-storage=enable // #4

Then for each, we could have values like "enable/disable/preview" or similar to choose if you wont the current supported one, disable it, or use the "preview" version.

Also, from the list thread https://lists.wikimedia.org/hyperkitty/list/cloud-admin@lists.wikimedia.org/thread/CHGYMEUYSIFU3AIVPXFKEFIDT7UK6YV2/ (storage section).

I would try to keep it transparent for the user if it's using cinder volumes, nfs shares, ceph rdb images, or anything else.

nskaggs renamed this task from Toolforge: consider introducing some semantics for presistent storage to Toolforge: consider introducing some semantics for persistent storage.May 22 2023, 4:49 PM

I think we are all in the same page. Refreshed the task description with all the additional info.