Page MenuHomePhabricator

Re-check Commons file when using cached version
Closed, ResolvedPublic

Description

At the moment we're caching the SVG from Commons and then re-using it for subsequent requests without confirming that it has not been changed on Commons.

We should still use the cached version, but on every use also check that a new version hasn't been uploaded. If a new version has been uploaded, we should refresh our cache from that.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Samwilson renamed this task from Re-check Commons file while using cached version to Re-check Commons file when using cached version.Jan 7 2019, 11:48 PM

@Samwilson When you say "..on every use..", do you mean after each upload? Can we not clear the cache for an image after it's been used and uploaded? Then fetch the new version when it's requested again. Maybe I'm misunderstanding you.

Yes, sorry, I think I've got it wrong: it's not on every use that we want to check, but certainly on upload and download (which I thin is handled in those tickets so doesn't need to be worried about here). At the moment the cache lifetime is 5 hours — I think we could actually increase this, but have the cache-invalidation process check more often for a modified file (i.e. it would check against the checksum on Commons, and so this would be a reasonably quick thing).

@Mooeypoo you brought this up the other day; can you remember the bad-cache scenario we were talking about?

I'll hold off on prioritizing this for estimation until we have more clarity on what we exactly want. @Mooeypoo @Samwilson This might be something you want to chat about in the engineering meeting.

@Mooeypoo you brought this up the other day; can you remember the bad-cache scenario we were talking about?

Yes:

  • User 1 downloads File1 from commons, translates to some language. User 1 now either stops (take a break, leaves the computer, or downloads the image manually, etc) or when things are enabled, downloads the image (not sending it to commons)
  • Meaniwhile, User 2 did not use SVGTranslate; they downloaded File1 directly from commons, translated it, and manually uploaded it back to commons as a new revision of the file.
  • User 3 downloads File 1 in SVGTranslate (or user 1 again, same idea) -- If the file is cached, then the user now edits a previous revision of the file, and if they download it or re-upload to commons, then they essentially just overrode the previous version.

This is mostly relevant for when we enable an action with the file (download or upload to commons) because in all other cases even if the file isn't updated there's no real consequence if the un-updated can't be used, but we'll start enabling upload soon, and at that point, we really need to make sure we are working with the proper version of the file when we translate.

@Mooeypoo The same problem happens when while User 1 is translating the file on SVG Translate, User 2 comes along and updates the file version on Commons, adding a new translation. If we upload the new file after User 1 adds their translation, we will override the translations added by User 2.
Could we check for a new file version just before uploading our file and if there's one present, we fetch that and add our translations to it? There will be edge cases such as when the file on commons has been updated in a way that the labels have changed but we can handle that with an error message on our end.

@Mooeypoo The same problem happens when while User 1 is translating the file on SVG Translate, User 2 comes along and updates the file version on Commons, adding a new translation. If we upload the new file after User 1 adds their translation, we will override the translations added by User 2.
Could we check for a new file version just before uploading our file and if there's one present, we fetch that and add our translations to it? There will be edge cases such as when the file on commons has been updated in a way that the labels have changed but we can handle that with an error message on our end.

We could, but that will produce a different issue of what we do with the image if the original is different.
I think we can try and come up with a way where we consolidate the languages if the changed file has different languages, but we will start running into issues when the language is the same, etc.

For the MVP, I suggest we don't do anything about consolidation of translations if the image is different at the end; checking that the file didn't change when you *start* working on it is relatively straight forward, we can probably do that for the MVP (also, that will solve some issues about wanting to cache vs. wanting to give a good enough version for the user) but consolidating languages can be a lot more elaborate.
We can probably look into warning the user if the image has changed while they were translating before we look into resolving conflicts.

These issues are certainly real but seem somewhat theoretical. In practice, I struggle to believe that these scenarios will happen on a regular basis.

This is an image that seems ripe for being translated in SVG Translate. It has one revision. It's possible this work is over-optimization.

These issues are certainly real but seem somewhat theoretical. In practice, I struggle to believe that these scenarios will happen on a regular basis.

This is an image that seems ripe for being translated in SVG Translate. It has one revision. It's possible this work is over-optimization.

That's a good point. We can start without this optimization and re examine when/if it's needed.

I've merged T216207 into this task, because it's the same problem.

I think most of the above noted problems will be helped by just reducing the cache time. Does anyone have an estimate of how long people work on an individual image translation? I think maybe 5 minutes would be okay for the cache duration. That'd mean that every five minutes the user would experience a slightly longer delay in the preview updating; are there other downsides that I'm not thinking of?

New cache lifetime has been deployed, in version 1.1.11.

The initial problem here still exists, but is likely to be far rarer.

dom_walden moved this task from QA 🐛 to Done 🏁 on the Community-Tech (CommTech-Sprint-10) board.
dom_walden subscribed.

I tested uploading a few translations with https://svgtranslate-test.toolforge.org. I think I observed the 5 minute cache timeout, the SVG the tool is pulling appears to be updated after 5 minutes.

I am going to move this straight to Done and resolve as it is a fairly technical change.