Page MenuHomePhabricator

Pass extra GET parameter for Media Viewer preload requests
Closed, DeclinedPublic

Description

This is a stopgap measure and will make next/prev media viewer hits unaccounted for in the per-file stats for now.

Event Timeline

Gilles claimed this task.
Gilles raised the priority of this task from to High.
Gilles updated the task description. (Show Details)
Gilles added a project: MediaViewer.
Gilles added subscribers: Gilles, ezachte.
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Isn't that what $wgMediaViewerImageQueryParameter is for?

I.e., is this any different from T77882?

Yes, we want to differenciate preloading requests in this case. With this @ezachte will ignore next/prev hits in his stats for now. Only the first image viewed in Media Viewer will count.

We'll still need to mark *all* Media Viewer views to remove them from the total, basically Media Viewer views counted for now will be the ones marked with wgMediaViewerImageQueryParameter and not parked as preload requests.

Using different URLs for preloading and for the actual request kind of breaks the point. You can have a different URL for a thumbnail click and a next click, but what if the user prefers to use Esc + click on the next thumbnail manually (a rather natural workflow if you read a long article with sparse images)? What if the user reloads the page?

More generally, I feel this whole logging initiative is going in a completely wrong direction. Using server logs to follow user activity was a cutting edge technology twenty years ago, when changing the document usually meant navigating to a different resource; these days most interaction happens on the client side and logging should happen there as well. Instead of piling hacks upon hacks to send info to server-side scripts in query strings, we should invest into a proper client-side logging system for virtual pageviews.

You're preaching to the choir, this is a stopgap measure, as mentioned in the task description. EventLogging has just started using sendBeacon, but we're still missing this: T87177 for a proper client-side solution. The Analytics team has been a bit unclear so far about how much effort is required to get that done.

Hi Tgr, I'm really hoping we can go forward with this imperfect solution, albeit as a stopgap measure, to be improved on later. These data have been asked for by GLAM partners for at least 5 years, and time and again it looked as if WMF was going to deliver. With our RFC, and our exposure at 2014 GLAM hackaton, we revived expectations, and now we are so close to delivery. I am sure people who are going to use these stats would prefer good now over perfect later, as long as we are frank about what is missing.

The main problem is not that this is imperfect, it's that it is going to hurt performance. If you use different URLs for the same image in different contexts, that image will be downloaded multiple times by the browser.

As a horrible temporary hack, I would send GET request to something like /wiki/VitualPageView:File:<file name>, and make sure in Varnish that they return a very small result; the URL can be pinged with sendBeacon or a virtual pixel. This uses very little extra bandwidth (thus no performance hit), can use arbitrary JS logic to decide what should count as an image view (say >1s spent viewing the image), gets the data straight into Hadoop and the pageview logs, and can be visualised by existing infrastructure such as stats.grok.se. It would have to be filtered out at some places; I don't know how much difficulty that would present.

Which is why I did not object to using query parameters to differentiate between MediaViewer, file page, thumbnails etc. But preloading and non-preloading for MediaViewer use the same sizes, that's the entire point of preloading.

FWIW can I throw in another option? I can't judge feasibility.

What happens when a thumbnail is clicked, resulting in MediaViewer firing up. Is that all client side? Either that or via a post message, somewhere there is an moment where js client side or php server side knows MediaViewer will be instantiated and which image to show initially. Couldn't we derive a loggable event from that?

This would still be the imperfect solution (missing follow-up browsing of preloaded images), but at least the cache wouldn't be involved (with variant urls). Beacons have been suggested earlier, but this suggestion focuses on a different point in time, and different place to place the hook.

Gilles claimed this task.

@ezachte logging the event when the thumbnail is clicked or when the image is displayed is all the same to us, all of that happens in JS. The latter seems more like a logical image view than the former, though.

I'll close this ticket, as the alternative described in T89088 is a lot cleaner and not harder to implement.