Page MenuHomePhabricator

SwiftMedia URL rewrites and container names
Closed, ResolvedPublic

Description

"The middleware inserts the account name into the URL, converts the "wikipedia/commons" section into a Swift container name by replacing slash with %2F, adds "%2Fthumb" or "%2Farchived" or "%2Fdeleted" to the container name and adds the rest of the hashing and filename as the object name"

To be clear their is the cloudfiles interface and the rewrite.py script in the WSGI stack. This is about the later.

"archived" should not be added to the container name (and actually isn't in the code). On the other hand, "temp" and "public" should.

Examples of how rewrites should happen (ignoring container sharding):

Original URL: upload.wikimedia.org/site/lang/a/ab/file.jpg
Swift URL: site-lang-images-public/a/ab/file.jpg

Original URL: upload.wikimedia.org/site/lang/thumb/a/ab/file.jpg/120px-file.jpg
Swift URL: site-lang-images-thumb/a/ab/file.jpg/120px-file.jpg

Original URL: upload.wikimedia.org/site/lang/thumb/archive/a/ab/file.jpg/120px-file.jpg
Swift URL: site-lang-images-thumb/archive/a/ab/file.jpg/120px-file.jpg

Original URL: upload.wikimedia.org/site/lang/temp/a/ab/file.jpg/120px-file.jpg

Swift URL: site-lang-images-temp/a/ab/file.jpg/120px-file.jpg

The above would be consistent with FileRepo/FileBackend.


Version: unspecified
Severity: normal

Details

Reference
bz33286

Related Objects

StatusSubtypeAssignedTask
ResolvedNone
ResolvedNone

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 22 2014, 12:02 AM
bzimport set Reference to bz33286.
bzimport added a subscriber: Unknown Object (MLST).
  • CHANGE TO THE ABOVE ***

We are going to use "media-" instead of "images-". Given that, we want:

Original URL: upload.wikimedia.org/site/lang/a/ab/file.jpg
Swift URL: site-lang-media-public/a/ab/file.jpg

Original URL: upload.wikimedia.org/site/lang/thumb/a/ab/file.jpg/120px-file.jpg
Swift URL: site-lang-media-thumb/a/ab/file.jpg/120px-file.jpg

Original URL:
upload.wikimedia.org/site/lang/thumb/archive/a/ab/file.jpg/120px-file.jpg
Swift URL: site-lang-media-thumb/archive/a/ab/file.jpg/120px-file.jpg

Original URL: upload.wikimedia.org/site/lang/temp/a/ab/file.jpg/120px-file.jpg

Swift URL: site-lang-media-temp/a/ab/file.jpg/120px-file.jpg

FYI:
There should be no rewrite rules going to the site-lang-media-deleted container and it should also be restricted from the swift user the proxy uses to autenticate when it rewrites URLs.

Another one we want:

Original URL:
upload.wikimedia.org/site/lang/thumb/temp/a/ab/file.jpg/120px-file.jpg

Swift URL: site-lang-media-thumb/temp/a/ab/file.jpg/120px-file.jpg

'media' was replaced with the repo name, so we now have:

Original URL: upload.wikimedia.org/site/lang/a/ab/file.jpg
Swift URL: site-lang-local-public/a/ab/file.jpg

Original URL: upload.wikimedia.org/site/lang/thumb/a/ab/file.jpg/120px-file.jpg
Swift URL: site-lang-local-thumb/a/ab/file.jpg/120px-file.jpg

Original URL:
upload.wikimedia.org/site/lang/thumb/archive/a/ab/file.jpg/120px-file.jpg
Swift URL: site-lang-local-thumb/archive/a/ab/file.jpg/120px-file.jpg

Original URL:
upload.wikimedia.org/site/lang/thumb/temp/a/ab/file.jpg/120px-file.jpg
Swift URL: site-lang-local-thumb/temp/a/ab/file.jpg/120px-file.jpg

Original URL: upload.wikimedia.org/site/lang/temp/a/ab/file.jpg/120px-file.jpg

Swift URL: site-lang-local-temp/a/ab/file.jpg/120px-file.jpg

bhartshorne wrote:

If we shard a container, do the hashes still wind up in the filename?

Consider:
Original URL: upload.wikimedia.org/site/lang/a/ab/file.jpg
Swift URL (a): site-lang-media-public-ab/a/ab/file.jpg
Swift URL (b): site-lang-media-public-ab/file.jpg

Original URL:
upload.wikimedia.org/site/lang/thumb/archive/a/ab/file.jpg/120px-file.jpg
Swift URL (a): site-lang-media-thumb-ab/archive/a/ab/file.jpg/120px-file.jpg
Swift URL (b): site-lang-media-thumb-ab/archive/file.jpg/120px-file.jpg

I think (a) makes more sense. rewrite.py currently either hashes the container
or drops the hash entirely - similar to (b) but removing a/ab/ even if the
container is not hashed.

Oh, I wasn't accounting for hashing in the post above. I was just using the conceptual container names.

Yes, it will use (a), that is, keeping the hash dir in the path. Though it would not be "site-lang-media-thumb-ab", it will be "site-lang-media-thumb.ab".