Swift file storage engine for Phabricator
ClosedPublic

Authored by mmodell on May 5 2018, 9:44 PM.

Details

Reviewers
fgiunchedi
mmodell
Commits
rPHEXc16c80eefadd: Swift file storage engine for Phabricator
Patch without arc
git checkout -b D1049 && curl -L https://phabricator.wikimedia.org/D1049?download=true | git apply
Summary

Openstack Swift storage engine for Phabricator. Somewhat based on Phabricator's default Amazon S3 file storage engine.

Adds configuration options under storage.swift.*

  • storage.swift.account - name of the swift account
  • storage.swift.key - key for accessing the swift service
  • storage.swift.container - name of the storage container.
  • storage.swift.endpoint - the url to access the swift frontend
Test Plan

tested locally, I need to get this working in beta.

Diff Detail

Repository
rPHEX phabricator-extensions
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
mmodell created this revision.May 5 2018, 9:44 PM
mmodell requested review of this revision.May 5 2018, 9:44 PM
mmodell added a subscriber: awight.

See preliminary comments inline, something else to keep in mind wrt big files: swift is limited by default to 4-5GB files as a single object. Going over that means either using SLOs or DLOs: https://docs.openstack.org/swift/latest/overview_large_objects.html

src/filestorage/PhabricatorSwiftConfigOptions.php
29

For commons/mediawiki given the huge number of files we shard across 256 containers based on the filename hash e.g. container.ff. This helps both with contention on containers and "blast radius" e.g. if something happens to a single container not all files are affected.
Implementing something like this requires aid from the application of course, what's your expectation in terms of number of files in this case?

src/filestorage/PhutilOpenstackFuture.php
118

What happens with large bodies here? e.g. clients ask for "big" files ?

See preliminary comments inline, something else to keep in mind wrt big files: swift is limited by default to 4-5GB files as a single object. Going over that means either using SLOs or DLOs: https://docs.openstack.org/swift/latest/overview_large_objects.html

No worries, phabricator is even worse than that - it's currently limited by PHP's string length limit because phabricator doesn't bother to process things in chunks - it loads entire files into a php string and thus can't handle anything beyond 2 gigabytes.

The 2gb limit will (at least theoretically) be lifted with php 7, however, the whole way in which phabricator handles files is still flawed. We might be rescued from problems by the fact that our phabricator server is spec'd with a lot of RAM to spare.

mmodell added inline comments.May 8 2018, 8:48 AM
src/filestorage/PhabricatorSwiftConfigOptions.php
29

Indeed, this is something I will look into. Thanks for the reminder!

src/filestorage/PhutilOpenstackFuture.php
118

things go boom when the 2gb limit is exceeded. I'm currently investigating the underlyting APIs, there is at least theoretical support for streaming the data instead of processing it all at once, however, it's of course undocumented and somewhat complicated by the layers of tech involved.

mmodell added inline comments.May 8 2018, 9:05 AM
src/filestorage/PhutilOpenstackFuture.php
118

Ok actually it would appear that I spoke too soon regarding large files. The chunking of the files happens at a higher level in the code - so the back end storage driver doesn't need to worry about large files - they are always sharded into a bunch of smaller objects before being passed to the storage engine. It appears that chunking is handled in the PhabricatorFileUploadSource class.

mmodell updated this revision to Diff 2768.May 8 2018, 9:56 AM

Shard the files across a bunch of containers based on the first two characters
of the object name. The object name is already a randomly generated string
of six characters separated by slashes. This takes change simply appends to
the container name a suffix which is a dash followed by the first two bytes
from the object name key.

mmodell retitled this revision from WIP: Swift file storage engine for Phabricator to Swift file storage engine for Phabricator.May 8 2018, 9:59 AM
mmodell edited the summary of this revision. (Show Details)
mmodell edited the test plan for this revision. (Show Details)
mmodell added a reviewer: fgiunchedi.
mmodell marked an inline comment as done.
mmodell added inline comments.
src/filestorage/PhabricatorSwiftConfigOptions.php
29

Ok now I shared across a bunch of containers based on the first two bytes of entropy which phabricator is already generating in order to give files a random names.

mmodell updated this revision to Diff 2769.May 8 2018, 11:35 AM

Got it working with vagrant.

mmodell updated this revision to Diff 2770.May 8 2018, 6:07 PM

default values for some settings.

This works now. Not gonna get it deployed this week but will test it more on beta and then hopefully deploy next week.

mmodell accepted this revision.May 9 2018, 11:59 PM
This revision is now accepted and ready to land.May 9 2018, 11:59 PM
This revision was automatically updated to reflect the committed changes.
fgiunchedi added inline comments.May 10 2018, 8:37 AM
src/filestorage/PhutilSwiftFuture.php
20

It looks like getContainer is never invoked with an empty key (and shouldn't, afaics?)

42

How would container creation and setting access work? The way MW handles it is creating the required containers on wiki setup and setting permissions for public/private containers with setZoneAccess.php. Is all file access mediated/proxied by Phabricator in this case, thus all containers can be private to the Phabricator account?

mmodell added inline comments.May 10 2018, 8:40 AM
src/filestorage/PhutilSwiftFuture.php
20

indeed this is true

42

As of now it attempts to create the container on demand and does not override the default access control. Indeed all file access is proxied through phabricator so the containers can be private.

Nice work! Looking forward to see this working in beta.

src/filestorage/PhutilSwiftFuture.php
20

I think it'd make more sense to have $key non-optional instead and container sharding either enabled or disabled, not based on key length

42

Thanks! Ok I think auto-creation will work for now. A useful addition/feature in the future would be to not require auto-creation, i.e. through an utility that would create the required containers and can be run with a different set of swift credentials. This way the swift account Phabricator uses doesn't need to be privileged in any way, just rw access to selected containers.

mmodell added inline comments.May 10 2018, 1:27 PM
src/filestorage/PhutilSwiftFuture.php
42

Could we just create the containers with a shell script on the swift cluster? ;) I could create a utility to do it but building that inside phabricator's UI is a lot of boilerplate for not much benefit.

fgiunchedi added inline comments.May 10 2018, 2:53 PM
src/filestorage/PhutilSwiftFuture.php
42

In my experience shell scripts are never "just", so no, better avoiding that :) In this case what I had in mind with "utility" isn't a full fledged UI, a php command line utility that reads phabricator config is what I had in mind.

Also when (if? don't know if you have plans to upstream) this gets accepted upstream, said utility is useful to other users too and contained in Phabricator already. Anyways, something to think about, auto-creation is fine for now and only slightly more complex for users that want to create containers beforehand.

@fgiunchedi: in rPHEX71f853aaf90a: Added code to convert auth credentials into an auth token I've added auth token handling and made the container sharding mandatory based on the first 2 digits of an md5 hash of the object name key.