Page MenuHomePhabricator

Add FSFileBackend (file) persistent storage
Closed, ResolvedPublic5 Estimated Story Points

Description

Add file caching to Phonos (similar to how the the Score extension caches files) β€” this might not be the final cache method, but will be reasonable to start

For Reference: https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/Score/+/refs/heads/master/includes/Score.php#251

Event Timeline

  • How will the cache key be generated?
  • How are we going to invalidate a file?
  • Is a "last modified time" available for files?
  • Can we touch a file on retrieval?
    • If so, invalidation would be finding the "oldest" last modified file(s)
TheresNoTime renamed this task from Add file caching to Add FSFileBackend (file) caching.Aug 1 2022, 3:05 PM

Change 822459 had a related patch set uploaded (by MusikAnimal; author: MusikAnimal):

[mediawiki/extensions/Phonos@master] Add file caching, defaulting to FSFileBackend

https://gerrit.wikimedia.org/r/822459

While testing this, I had a thought: should we perhaps include in the json response a parameter to show it's using the cache instead of making a request? Something like:

{
  "phonos": {
    "ssml": "<speak><phoneme alphabet=\"ipa\" ph=\"h&#x259;&#x2C8;l&#x259;&#x28A;\">Hello</phoneme></speak>",
    "audioData": "...",
    "cached": true
  }
}

I can't think of a reason why, it just feels like one of those things which might be useful in the future

And as to the cache key generation, should we perhaps add a static salt config parameter (which we can set in PrivateSettings.php) to make it slightly more difficult to programmatically calculate these?

Change 822459 merged by jenkins-bot:

[mediawiki/extensions/Phonos@master] Add file caching, defaulting to FSFileBackend, and cleanup script

https://gerrit.wikimedia.org/r/822459

While testing this, I had a thought: should we perhaps include in the json response a parameter to show it's using the cache instead of making a request?

I can't think of a reason why, it just feels like one of those things which might be useful in the future

We could! I'm not aware of anywhere else we do this sort of thing, especially if there's no way to invalidate the cache.

And as to the cache key generation, should we perhaps add a static salt config parameter (which we can set in PrivateSettings.php) to make it slightly more difficult to programmatically calculate these?

The cache key generation is just to ensure we get a unique filesystem-safe name based on the params. I don't think we're concerned about someone programmatically trying to pull files, are we? If they wanted to do that, they could just use the Phonos API directly.

We could! I'm not aware of anywhere else we do this sort of thing, especially if there's no way to invalidate the cache.

Yeah, honestly I can't say anything more than "maybe helpful for debugging issues once it's deployed?" but I'm clutching at straws. No need to do it yet at the very least.

The cache key generation is just to ensure we get a unique filesystem-safe name based on the params. I don't think we're concerned about someone programmatically trying to pull files, are we? If they wanted to do that, they could just use the Phonos API directly.

Agreed, and it would just be a tiny bit of security through obscurity β€” just a thought given the Google ToS. While we can probably fairly easily rate limit the use of the API, pulling files directly would (maybe?) be a bit more difficult to rate limit.
Adding a salt feels like a low effort step which might prove to be slightly helpful with little to no negatives.

I've been testing this with using a different backend, with the AWS extension. Can't get it to work (because of what I think is a bug in the AWS extension: "error while validating the input provided for the HeadObject operation: [Bucket] is missing and is a required parameter"), but before I dig too far into it I'm just wondering if it's the intention of Phonos to store its files in whatever the default upload storage is β€” or whether it should to default to local and require specific configuration to set a different store? I feel like the former is the less surprising path, but the latter seems to be easier to build and is what we've got as of now.

(I'm not very familiar with FileRepo or FileBackend configurations.)

TheresNoTime renamed this task from Add FSFileBackend (file) caching to Add FSFileBackend (file) persistent storage.Aug 17 2022, 8:23 AM
dom_walden subscribed.

Testing locally, I see audio files stored in images/<wiki name>-phonos/. I checked that cached audio is played correctly on a wiki page (using the {{#phonos}} template).

The files are unique per engine, IPA, text and language. If a user changes the IPA, text or language they are using, it will get a new file from the IPA engine and cache that.

I tested the larynx, espeak and google engines.

I tried using $wgSharedUploadDirectory, but I don't think I could get it to work.

Test environment: local docker Phonos 0.1.0 (3ccf24e) 23:53, 17 August 2022.