Page MenuHomePhabricator

Create MediaViewer image varnish hit/miss ratio dashboard
Closed, ResolvedPublic

Event Timeline

Tgr raised the priority of this task from to Needs Triage.
Tgr updated the task description. (Show Details)
Tgr added projects: MediaViewer, Multimedia, Analytics.
Tgr changed Security from none to None.
Tgr subscribed.

This would complement T76035.

Tgr triaged this task as Medium priority.Dec 10 2014, 11:56 PM

Change 179771 had a related patch set uploaded (by Gergő Tisza):
Calculate image cache miss ratio

https://gerrit.wikimedia.org/r/179771

Patch-For-Review

Change 179778 had a related patch set uploaded (by Gergő Tisza):
Calculate image cache miss ratio

https://gerrit.wikimedia.org/r/179778

Patch-For-Review

Change 179771 merged by jenkins-bot:
Calculate image cache miss ratio

https://gerrit.wikimedia.org/r/179771

Note that pre-rendered thumbnails will appear as a varnish miss. The first time they're requested they're in swift, but not in varnish.

Yeah, I didn't think of that. The Last-Modified header of thumbnails seems match when they were generated (Swift also adds an X-Timestamp header which seems to be the same). Maybe we should add that to our performance logging and assume a scaler miss if it is older than the time of sending the request? (Clock skew errors, yay.)

Or when the last-modified header is older than the date header minus the difference between local times for request and response? That's reasonably robust and we are collecting those times already.

In T78205#851804, @Tgr wrote:

Yeah, I didn't think of that. The Last-Modified header of thumbnails seems match when they were generated

Good catch! It will be very helpful to know the performance of thumbnails that were pregenerated but not copied from swift to varnish yet at the time of the request.

In T78205#851811, @Tgr wrote:

Or when the last-modified header is older than the date header minus the difference between local times for request and response? That's reasonably robust and we are collecting those times already.

That seems better, you can only ever use local time for relative time measurement. Some people have their clocks off by years.

Actually I see that there's a way to tell this only with headers, no need to calculate the local time difference. The "Age" header is the missing part of the puzzle. If the thumbnail is generated on the spot: Date - Last-Modified <= Age + 1 (the extra second is there because of rounding). If the thumbnail has been generated some time ago and just pulled from swift, Date - Last-Modified > Age + 1

The theory doesn't seem to hold true, the vast majority of varnish misses with a very small "Age" value have a very old Last-Modified value, regardless of when those files were uploaded. I think the explanations is that those thumbnails expire in varnish by not being accessed very often, then they're pulled from swift again when they're requested. Therefore old thumbnails can also be in that situation of being pulled from swift on the spot instead of being generated.

So while we can differentiate "true" misses (thumbnails have to be generated on the spot) from swift pulls thanks to Last-Modified, we can't tell if the swift pulls are happening in a prerendering scenario or a varnish expiry situation.

What's interesting in those findings, though, is that 99.34% of varnish misses are swift pulls, regardless of upload time. Which would suggest that unless we increase the duration thumbnails are retained for in Varnish, there isn't much of a performance gain to be had for misses. The thumbnails have almost all been generated a while ago and are at least in Swift.

It also means that prerendering only helps eliminate 0.66% of the varnish misses having to generate the thumbnail, although with recent files that ratio would probably be higher.

Also, while not that many varnish misses generating a thumbnail generation have been caught yet, they've all happened for files where the file upload time is older than the prerendering deployment: P173 Which is a confirmation that we're looking at the right information.

Mass-removing the Multimedia tag from MediaViewer tasks, as this is now being worked on by the Reading department, not Editing's Multimedia team.

Jdlrobson changed the task status from Open to Stalled.Sep 24 2015, 12:32 AM
Jdlrobson subscribed.

What's left to do here?
I'm a little confused. No activity since April. Please update and un-stall it :)

Done in https://grafana.wikimedia.org/#/dashboard/db/media I think? Although that's (Swift + Varnish) hit/miss, not pure Varnish.

Due to being vague and probably fixed.