Page MenuHomePhabricator

Track image context and pass information onto X-Analytics
Closed, InvalidPublic

Description

In order to track image views more meaningfully: https://www.mediawiki.org/wiki/Requests_for_comment/Media_file_request_counts#by_context we should add a url parameter to image requests in order to tell apart the general context the image is in. Varnish frontends should then translate that custom parameter to an X-Analytics header: https://wikitech.wikimedia.org/wiki/X-Analytics for the information to make it all the way to the cache logs.

We should avoid tracking too many different contexts to keep client side cache fragmentation minimal.

Event Timeline

Gilles raised the priority of this task from to Needs Triage.
Gilles updated the task description. (Show Details)
Gilles added projects: Multimedia, Analytics.
Gilles added subscribers: Gilles, ezachte.

I think the first decision to make is which contexts to track. Being too exhaustive means that users would needlessly have to reload the images, particularly the default thumb size, every time they see it in a different context. We should really figure out what contexts are likely to yield meaningful information.

IMHO thumb vs non-thumb is important, and that's achieved by giving a "thumbnail" context to uses that are clearly meant to be a thumbnail, both in articles and other pages (upload page, file page).

The file page can also be separated from in-article images, that can be done easily. There we might want to separate the main image displayed on the file page and the ones that are accessible by clicking. I think that the small thumbnails for versions, etc. should count as thumbnails, though. Which means that we might want some context overlap (i.e. an image can be a thumbnail and be on the file page). That's if we care about tracking file page context, though...

Lastly, we want to track media viewer image hits to blacklist them (because of preloading) and that's easily achieved. Logical media viewer image views will have to be tracked using their own mechanism, which mostly depends on T44815

Maybe in the end what really matters if the thumb/non-thumb difference? With media viewer hits being deducted, it would simply be a matter of marking thumbs as such. And by keeping it to only "tagging" thumbs, we know that large images, for which extra cache misses are costly to the users, wouldn't be affected.

What goes for thumbs goes for frames as well. I wouldn't mind if they get the same tag, they seems closely related in functionality.

Would it be possible to add this info directly to the X-Analytics header from Mediawiki using this extension?
https://gerrit.wikimedia.org/r/#/c/157841/

It might be possible for thumb.php to serve that header based on a GET parameter, yes, which would avoid any varnish frontend magic. The downside is that we would have multiple copies of the same image cached in varnish, one per context value. Whereas if we do this in varnish, we could also strip that context parameter, which would avoid cache fragmentation in varnish.

So what exactly should be done here?

  1. add a query parameter (say &x-analytics=fc=mv?) to all MediaViewer file requests - done in T77882
  2. add a query parameter (&x-analytics=fc=tn?) to thumbnail URLs when they are displayed with the frame or thumb format option
  3. add a query parameter (&x-analytics=fc=fp?) to the main thumbnail on the file page
  4. have varnish translate that query parameter to an X-Analytics header

Is this correct? Is #4 necessary or a nice-to-have (given that the query parameter would end up in the logs anyway)?

I can't comment on #4.

As for #3 seems not highest priority, but mostly useful for WMF itself, to see via which path 'intentional views' mostly happen. Without #3 these images have a different url from frame/thumb anyway as per #2, and will in most cases be above threshold size, so we can count them as 'intentional views'. But views on file upload page as separate column would be more precise.

As a somewhat clumsy workaround, we also have page views for the namespace 6 html pages (which don't tell us in which size the image was viewed, but at least we know these were intentional views). Of course it would be easier to have this metric in the same dump as all else related to images (also to avoid encoding differences of html page vs image file name), but seems not a no go to me.

Milimetric removed a project: Analytics.
Milimetric set Security to None.

@Milimetric, @Gilles: This task now has zero projects associated. Which basket is this task in?

Gilles claimed this task.
This comment was removed by Milimetric.