Page MenuHomePhabricator

Provide location, logged-in status and device information in ResourceLoaderContext
Closed, DeclinedPublic

Description

To determine which banner a user should see, CentralNotice needs the following inputs: project, language, country, logged-in status, device, other user-specific information saved in the browser, and random numbers. It follows a two-step process: (1) filter on inputs available server-side and send the client a list of campaigns that may be available to this user, and (2) make the final selection on the client based on the remaining inputs. (More details here.)

This process could be a lot more efficient if we could process more inputs on the server. Many less users would receive a list of possible banners and the code needed to process the list, only to have that code determine that they aren't targeted by any of banners. Client-side processing could be simplified and the code there could be shrunk (though not eliminated).

The list of possible banners is sent to the client via a special RL module. To allow more server-side filtering, we should provide additional inputs in ResourceLoaderContext and fragment the cache on those paraemters for calls to load.php.

Even providing just one more input on the server would improve CentralNotice performance a lot.

Also: it seems there may be other use cases for providing access to this data server-side.

Event Timeline

AndyRussG raised the priority of this task from to Needs Triage.
AndyRussG updated the task description. (Show Details)
AndyRussG added subscribers: AndyRussG, BBlack, Krinkle and 7 others.
atgo triaged this task as Medium priority.Aug 19 2015, 9:02 PM
atgo moved this task from Inbox to Banner issues on the Wikimedia-Fundraising board.
atgo set Security to None.
atgo subscribed.

I'd have to dig into the details of this more to see, but I'm inclined to think this isn't a net win to fragment the cache for load.php (on e.g. GeoIP, Device info, etc) to save some cached JS running in the browser. How big is the (per-project, per-language -filtered) banner list we send? I assume in its current form, it is part of the cacheable data (10m cache like RL?). load.php is a pretty hot URL as it is, and is already limited in its cache lifetimes.

This ticket is getting stale, is it still relevant and up-to-date with current plans?

Declining as I I don't think we should fragment ResourceLoader cache in this way.

  • It would require a significant amount of tailoring in ResourceLoader to the specific details of CentralNotice and Wikimedia deployment thereof.
  • It would not improve performance for the end-user in any way (aside from a few bytes of bandwidth in the response by making the array of candidate banners smaller). The data data would still be fetched from load.php in a separate async request and cause all the same problems that use this task as sub task. If anything, it would regress performance by making it far less likely to get a cache hit for the banner data. Thus adding app server overhead and latency to most of the relevant requests, which is probably unacceptable anyway from an ops perspective, regardless of latency, due to traffic load.

Instead, per comment on T52865, focus on T106099 instead.

Hi! Thanks much @Krinkle for explaining this in detail, and thanks @BBlack for considering this... More than happy to look into what we can do with service workers instead--seems like fun!

Just for the sake of completeness, I'd like to mention that the extra data sent to users whom the server thinks may be targeted by campaigns, but who actually aren't, doesn't seem that trivial (from my poorly informed perspective).

For example, right now, on enwiki, mw.centralNotice.choiceData contains the details of 22 campaigns, for a total of 7.5K minified. This is going out to all users of enwiki with English as their interface language (so, all anonymous users and many logged-ins) on both mobile and desktop, in any country.

All of the campaigns except one are geotargeted. So, splitting by country on the server would eliminate almost all of them, for most users.

If any of the campaigns possibly available to a user use campaign mixins (all Fundraising campaigns and many community campaigns do) then the RL modules for those will also be added as dependencies (though those modules may be cached in LocalStorage when that's possible).

Finally, regarding server load, I guess the counterargument is that that's something we could potentially remedy (in the worst case, I think, with more hardware) but users' bandwidth is out of our control.

Anyway, just thought I'd mention all of the above, in case it's useful... For decisions like this, I'd still very much wish to defer to those who are deeply familiar with performance stuff... :) I'll try to read up on service workers and may well get back to you with silly questions... Thanks again!!!! :D

P.S. Since I'm pretty ignorant of service workers, I have no idea how well they could potentially mitigate the above... Apologies if some or all of that isn't relevant...

@AndyRussG Service workers can help us in two ways:

As a client-side feature they essentially provide us with the capability to run a Node.js-like server within the user's browser in a background thread, which is allowed to intercept any same-origin network request. It can then use a combination of local storage, the browser's http cache, and fresh network requests to compose the response for that request.

For example, it could have a cached copy of the Vector skin template, and then for page views we only request the skinless page HTML from Varnish (saves bandwidth by not sending the skin each time), and stream it to the main thread wrapped in the skin template.

In addition, it could make a second request in parallel with the page HTML (at the same time!) for the banner data, and then decide on a banner and inject into the same page HTML as it gets streamed to the main thread. This means that from the main thread and user's perspective, it will look as if the banner HTML was already part of the MediaWiki response, no JavaScript involved! It can also be omitted based on the client's cookies. Similarly to how you'd to in PHP if we were to serve all page views from PHP. Obviously, since we cannot serve all page views from PHP due to the performance impact, similarly the the logic allowed in the service worker must be extremely simple and well-performing, but banners seem like a very realistic use case.

Another use case we look for here is improving logged-in user experience. Right now logged-in users have to fetch the entire response from MediaWiki, and do so on each page view. With service workers we could fetch just a little bit of JSON from the MediaWiki API about the logged-in user and cache it for a few minutes, or refresh it in the background ahead of the next page view. Then, when the user views a page, we just request the skinless page HTML, combine it with the Vector template and user information, and stream to the main thread. Basically Varnish-level latency for logged-in users!

In addition to being able to do this client-side (which not all browsers support yet), we can also do this server-side in an actual Node.js-service that we'd deploy which we emulate the browser's ServiceWorker environment, and would be used to proxy all regular page views. (Probably between two layers of Varnish).

Anyway, more info at T106099 and related tasks.