Page MenuHomePhabricator

Set up varnish 204 beacon endpoint for virtual media views and use it in Media Viewer
Closed, ResolvedPublic1 Story Points

Description

Use the same technique as EventLogging. Instead of going to a backend like EL, these requests go nowhere and are simply logged as part of the Varnish logs.

See the relevant code used for EL: https://github.com/wikimedia/operations-puppet/blob/production/templates/varnish/bits.inc.vcl.erb#L24

Details

Related Gerrit Patches:
operations/mediawiki-config : masterMake Media Viewer record virtual image views
mediawiki/extensions/MultimediaViewer : masterRecord virtual image views
operations/puppet : productionSet up beacon endpoint for virtual media views

Event Timeline

Gilles created this task.Feb 10 2015, 8:08 AM
Gilles raised the priority of this task from to Medium.
Gilles updated the task description. (Show Details)
Gilles added projects: Multimedia, Analytics.
Gilles added subscribers: Gilles, Tgr.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 10 2015, 8:08 AM
Gilles claimed this task.Feb 11 2015, 1:18 PM
Gilles moved this task from Untriaged to Next up on the Multimedia board.
gerritbot added a subscriber: gerritbot.

Change 190821 had a related patch set uploaded (by Gilles):
Set up beacon endpoint for virtual media views

https://gerrit.wikimedia.org/r/190821

Patch-For-Review

Gilles renamed this task from Set up varnish 204 beacon endpoint for virtual image views to Set up varnish 204 beacon endpoint for virtual media views.Feb 16 2015, 4:45 PM
Gilles removed a project: Patch-For-Review.
Gilles removed a subscriber: gerritbot.
Gilles added a subscriber: ori.
Gilles renamed this task from Set up varnish 204 beacon endpoint for virtual media views to Set up varnish 204 beacon endpoint for virtual media views and use it in Media Viewer.Feb 16 2015, 4:47 PM
Gilles added a project: MediaViewer.

Change 190823 had a related patch set uploaded (by Gilles):
Record virtual image views

https://gerrit.wikimedia.org/r/190823

Patch-For-Review

Gilles moved this task from Next up to Needs code review on the Multimedia board.Feb 16 2015, 5:04 PM
Gilles edited a custom field.Feb 18 2015, 10:23 AM
ori added a comment.Feb 21 2015, 6:12 AM

I'm not sure why this is necessary -- couldn't you just count the image requests themselves? As far as I know, we have the full request logs in Hadoop, so this should be possible.

Tgr added a comment.Feb 21 2015, 6:58 AM

Images are preloaded, which means that an image can be requested frequently just because it is next to a popular image. Request counts are also, to a smaller extent, distorted by caching (which is itself somewhat random as cache TTL heuristics are probably based on how long ago the image was uploaded).

BBlack added a subscriber: BBlack.Feb 23 2015, 11:10 PM

While I can see how we might be backed into this corner to get data we need, it seems wrong on so many levels to create additional requests in order to track the requests we're already getting.

The use of virtual pageviews is becoming the norm in the industry, because there's no way around it for modern apps. When you start making things service-oriented, your web interface being a thin layer using pushstate and your backend serving multiple very different frontends and you optimize your performance to leverage the browser or device's smart prefetching, come up with a preloading strategy of your own, bundle unrelated requests together when they happen close in time, etc. there's no other way around recording pageviews independently of backend requests.

The fact that we have a mostly outdated ecosystem that for the most part still relies on full, oldschool, html pageviews with no end-user performance concerns addressed, shouldn't be the reason why these needs are pushed back. The more heavily front-end-oriented applications get built, the more you should expect others to need virtual pageviews.

Also, had EventLogging supported the load for unsampled recording of those events, we would have just used that and nobody would have cared. It's just that EL seems better suited for sampled measurements, which have a completely different purpose, and we shouldn't try to shove everything under EL, I think.

Don't let the fact that some backend requests seem to be related fool you. There's no way to tell when someone has viewed something from their local cache other than by recording a virtual view. The next and previous buttons in Media Viewer are used a lot, it's the whole point of the feature, and whenever you do go to the next or previous image, there's a high likelihood that you'll be looking at a preloaded image, already in your browser's cache. And there's a high percentage of preloaded images that you'll never look at. There's no way to generate remotely accurate per-file stats in Media Viewer by looking at the image requests themselves.

We'll also be tracking for the first time how long people keep the image in view for, for free (just an extra parameter passed to the beacon url). Which is something that can't be inferred from the existing backend requests and therefore couldn't be done without an extra request. We could be doing more, that's just an example of what's possible once you decouple the backend requests and recording the views.

Change 190821 abandoned by Ori.livneh:
Set up beacon endpoint for virtual media views

Reason:
obsoleted by https://gerrit.wikimedia.org/r/#/c/192370/

https://gerrit.wikimedia.org/r/190821

Now I just need to update the Media Viewer patch to point to the new endpoint.

Change 190823 merged by jenkins-bot:
Record virtual image views

https://gerrit.wikimedia.org/r/190823

Change 196180 had a related patch set uploaded (by Gilles):
Make Media Viewer record virtual image views

https://gerrit.wikimedia.org/r/196180

Change 196180 merged by jenkins-bot:
Make Media Viewer record virtual image views

https://gerrit.wikimedia.org/r/196180

Gilles closed this task as Resolved.Mar 18 2015, 9:06 AM
Restricted Application added a subscriber: Matanya. · View Herald TranscriptSep 22 2015, 7:19 AM