bzimport set Reference to bz43250.
Krinkle created this task.Via LegacyDec 18 2012, 11:51 PM
bzimport added a comment.Via ConduitDec 19 2012, 4:43 PM

mwalker wrote:

When we do show a banner it's important that we record that fact -- it allows us to determine the effectiveness of fundraising banners.

We also record the non showing of a banner as a sort of debug mechanism. We've been attempting to use the data to determine why there is such a large descrepency between requests for banners and impressions. It also allowed us to model the falloff rate of impressions when we made the change to only show banners to new users -- those that had only seen less than n banners.

Not a defense, just background -- this is exactly what we'd been doing before, just more obvious now due to the requirement of the record impression call (It's difficult to determine after the event what banner was served from BannerRandom as that only provides us the slot number).

bzimport added a comment.Via ConduitFeb 11 2013, 6:19 PM

mwalker wrote:

I'm now thinking that CentralNotice will track categories of banners. And then set properties specific to them -- IE: hide cookie name/duration and if it's a reporting banner.

ori changed the title from "CentralNotice should not make requests to Special:RecordImpression from every page (even when there are no active campaigns)" to "Special:RecordImpression should die in a fire".Via WebDec 4 2014, 5:27 AM
ori added subscribers: awight, K4-713.
ori set Security to None.
ori added a subscriber: ori.

I took this up with Adam Wight on IRC, and he reiterated Matthew's point --

<awight> ori: re Special:RecordImpression, the issue is that we need a record of *not* displaying a banner, as well.

I see Special:RecordImpression is now included in EasyPrivacy, one of the more popular filter subscriptions for AdBlock Plus:

Can we finally get rid of this thing?

AndyRussG added a subscriber: AndyRussG.Via WebJan 14 2015, 3:25 PM
AndyRussG added a comment.Via WebJan 14 2015, 3:45 PM

Thanks! FWIW, Special:RecordImpression is used heavily by Fundraising analytics. However, we're now looking at other mechanisms for sending back samples of more complete banner histories (or something of the sort), so we might consider rolling a replacement of Special:RecordImpression in with that.

Good to know about the AdBlock filter! I'm sure that's skewing the data at least a bit, also a good reason to move on this...

AndyRussG added subscribers: ellery, Ejegg.Via WebJan 14 2015, 3:46 PM
awight added a comment.Via EmailJan 14 2015, 8:42 PM

My only beef with S:RI is that it calls PHP. That's the real crime, here.
It renders a stupid empty string.

How about this first step: VCL rules to skip the PHP call and serve
hardcoded emptiness?

Glaisher added a subscriber: Glaisher.Via WebJan 15 2015, 4:29 PM
faidon raised the priority of this task from "Normal" to "Unbreak Now!".Via WebJan 26 2015, 2:56 AM
faidon added a subscriber: faidon.

These are the top hits against a Varnish frontend (across *all hits/projects*):

. Look where / or /api.php or /wiki/Main_Page are compared to Special:RecordImpression. Imagine where they are compared to the aggregate of all these RecordImpressions.

The problem isn't PHP, it's this whole entire thing. You got to find a different way to do this, this is just a ridiculous amount of requests.

Ejegg added a comment.Via WebJan 26 2015, 7:14 PM

Ellery, I see this is sampled with udplog at 1:100. Would we get the same result by sampling 1:1 and making the request with probability .01?

ellery added a comment.EditedVia WebJan 26 2015, 7:41 PM

It depends. How does the sampling work with udplog? When sampling at rate 1:n, does it keep a record with probability 1/n? If so, there is no difference. If it keeps every nth record, then its different. How different depends on the original amount of data and the application.

I agree that making the request with probability 1/n is a good approach. I'm not sure yet what value to choose for n. If it is too high then, we have to run AB tests longer. A safe bet is n=10. If thats not enough of a decrease, lets go with n=100. In the mean time, I will try develop some theory for AB testing that accounts for the additional variance from sampling. It may be that even larger values of n are acceptable.

ellery added a comment.Via WebJan 26 2015, 7:46 PM

Also, I'm not aware of any use cases for the request when there are no fundraising banners running.

awight added a comment.Via EmailJan 27 2015, 5:39 PM

Well, we intend to make the impression numbers available to all CN admins,
so we shouldn't do anything that's contingent on WMF-fundraising specific

Udp2log is squirrely about the sampling. But it's supposed to be random, so
Elliott's downsampling workaround is a great idea.

ori added a comment.Via WebJan 28 2015, 8:49 AM

Well, we intend to make the impression numbers available to all CN admins,
so we shouldn't do anything that's contingent on WMF-fundraising specific

Udp2log is squirrely about the sampling. But it's supposed to be random, so
Elliott's downsampling workaround is a great idea.

Adam, I don't mean to be rude, but the task is entitled "Special:RecordImpression should die in a fire" for a reason. The only reason it still exists is that Fundraising has indicated that you depend on it, and Fundraising is mission-critical. I would have undeployed this without ceremony a long time ago were it not for that. The desire to extend the utility of Special:RecordImpression so that it serves CN admins is undoubtedly well-intentioned, but it is not going to happen.

ellery added a comment.Via WebJan 28 2015, 6:38 PM

Does the Special:RecordImpression request get sent even when there is no campaign running for the country and device class of the client? If so, that would be a clear way to reduce requests.

faidon added a comment.Via WebJan 28 2015, 7:04 PM

Does the Special:RecordImpression request get sent even when there is no campaign running for the country and device class of the client? If so, that would be a clear way to reduce requests.

It does for the whole matrix of UAs & languages. Fixing this isn't enough for me, though; this seems like a badly deisgned thing in general and probably needs to go entirely, even when running campaigns.

Can we get an ETA for fixing this situation?

atgo moved this task to Important on the MediaWiki-extensions-CentralNotice workboard.Via WebJan 28 2015, 7:48 PM
ellery added a comment.Via WebFri, Jan 30, 5:15 PM

Here is a slight design constraint for sampling impression data:
to do valid hypothesis testing , we need to know how many of the impressions we recorded did not lead to donations.

I know that Special:RecordImpression is going away, but to illustrate, if we where to send the request randomly with probability 1/r, we could achieve the above by generating an impression ID on result=show and passing that ID along when the client clicks and when the client donates.

This way, we get the number of recorded impressions that did not lead to a donation by removing those with an ID from a donation.

I am happy to go over how why this is necessary and to describe how to modify a classic hypothesis test to deal with this, if anyone needs more information/justification.

gerritbot added a project: Patch-For-Review.Via ConduitTue, Feb 3, 6:05 PM
gerritbot added a subscriber: gerritbot.

Change 188394 had a related patch set uploaded (by Ejegg):
Sample banner impressions client side


gerritbot added a comment.Via ConduitTue, Feb 3, 6:05 PM

Change 188395 had a related patch set uploaded (by Ejegg):
Special:RecordImpression now sampled client-side


ellery added a comment.Via WebTue, Feb 3, 7:13 PM

To reiterate, to do AB-testing with sampled impressions, you need to know how many of the sampled impressions did not lead to a donation. This patch does not address this.

AndyRussG added a comment.Via WebWed, Feb 4, 9:57 PM

Hi... Any comments on how the retirement of udp2log might impact on this? Thx!

AndyRussG added a comment.Via WebWed, Feb 4, 10:38 PM

@ellery, maybe you could give us some more details about the scope and nature of the data you need?

For example, do you use information about banners not being shown to projects and languages that are not currently targeted by any campaign? If you don't use it, maybe turning S:RI off for those cases, or turning it down massively, would be a first step?

Also, what about empty impressions, i.e., when a client doesn't get a banner because they don't meet the campaign/banner criteria (some of which have to be determined on the client)? (I don't mean cases when the banner is hidden by Javascript contained in the banner itself, just when the general CentralNotice code decides there's no banner for a given client.)

I think separating out what's needed for debugging and what's needed for analytics, and tuning things differently for each, could make a big difference. Debugging is also important, but I'm pretty sure it needs a much lower sample rate. Just a thought...

Remember: this also ties in to sending more data about banner history via other means (like maybe EventLogging or Kafka) since that could also replace S:RI; however, I have no idea what the implications might be WRT load on the cluster and bandwidth. Still, any tuning we do right away could probably be re-used when we move to a replacement mechanism (which will also be used for both analytics and debugging).

Another idea: we could make some of this controllable via config variables, so that if a short burst of more data is needed, it could be turned on and off...


P.S. @faidon: Wow...! 8p

AndyRussG added a comment.Via WebWed, Feb 11, 9:05 PM

Some thoughts on organizing code for the above approach:

  • Add a global impression-logging facility to CN, which could be turned off or set to a given sampling rate via config variables.
  • Create a campaign-associated mixin for logging impressions and banner history for any given CentralNotice campaign. Sampling rate and other details could be set differently for each campaign or controlled via campaign-specific JS. Both shows and hides for campaign-targeted users could be recorded, as required. (We'd have to implement campaign-associated mixins, which don't exist yet but are planned.)
faidon added a comment.Via WebThu, Feb 12, 11:51 AM

Can someone give me a high-level summary of why a separate call to record impressions is needed, rather than just running analytics on our logs?

Also, it's 2½ weeks since I set this to UBN! and there's still no proposed patches to turn this off, no ETA for a proper fix or a workaround, nor one in the horizon — this task is mostly "thoughts" and brainstorming. If you disagree with the priority, we can discuss that, but otherwise I don't understand why this isn't getting the attention its priority indicates.

AndyRussG added a comment.Via WebThu, Feb 12, 3:46 PM

Re: priority, @faidon, you're completely right. Apologies and thank you for saying this!

I've smoke tested Elliot's CentralNotice patch for sampling client-side. I'm pretty sure it can be merged and deployed as an emergency measure. I'll try to get everything in order, including approvals, to push it through today. That should give us time to find a more lasting solution. Does that make sense?

Regarding why we can't get the same information from the logs without S:RI: in the vast majority of cases, I think we can. But sometimes banners are retrieved and injected into the client DOM (so the request to Special:BannerLoader does happen) but Javascript contained in the banner decides not to actually display the banner to the user. This result, including info about exactly why the banner wasn't shown, is also sent via params on S:RI. But this represents a tiny fraction of total S.RI requests.

AndyRussG added a comment.Via WebFri, Feb 13, 1:22 AM

As per discussions with FR, I just amended Elliot's CN patch for review to only downsample impressions when the server knows that there are no campaigns available for the user. Should still be a big improvement! It would mean, however, that we wouldn't use the config change to remove server-side log sampling. Hope this is helpful! :)

AndyRussG added a comment.EditedVia WebFri, Feb 13, 3:35 AM

I should add the caveat that it won't be an improvement at times like now, when there are campaigns targeting all projects and all languages. That's because project and language are the only criteria that are selected for on the server.

On the client, CentralNotice further filters on country, logged-in status and device. We could change the above patch so it decides whether to sample impressions after client-side processing of those criteria, rather than before.

If we did that, we'd get an improvement right now, since the generalized campaigns target only logged-in users (steward elections and WIkimania scholarships). However, a patch like that wouldn't improve things much when there are massive campaigns directed at anonymous users (such as WLM, or FR campaigns in certain countries/languages).

BTW, just today I heard from a user about getting banner impression stats for a non-Fundraising campaign. The use case is completely legit.

tl;dr: the stopgap measure in the patch would drastically reduce calls to S:RI in many, but not all, scenarios, and it could be easily amended to cover more, but still not all. So, more to be done to find even a good temporary measure.

I'll still try to give this top priority! (Today we had a few other fires fluttering about.) Thanks...!!! :)

P.S. We could also still improve the Varnish config, as a complementary temporary measure, perhaps?

gerritbot added a comment.Via ConduitFri, Feb 13, 10:27 PM

Change 188394 merged by Awight:
Sample banner impressions client-side if no choiceData

awight moved this task to In Development on the § Fundraising Sprint Devo workboard.
AndyRussG added a comment.Via WebSat, Feb 14, 2:50 AM

Hi all!

A temporary fix with the amendment mentioned above (turning on sampling of Special:RecordImpression for any user not in a campaign, post client-side filtering) has been deployed.

We should get a performance improvement right away. As noted above, if we get any big campaigns for anonymous users, we'll still have the performance issue. But at least we have a bit of breathing space...

Many thanks @awight, @Ejegg, @faidon, @ellery, @ori and everyone else!!! :)

AndyRussG set Story Points to 2.Via WebMon, Feb 16, 8:16 PM
AndyRussG removed Story Points.Via WebMon, Feb 16, 9:03 PM
AndyRussG set Story Points to 4.
AndyRussG moved this task to In Analysis on the § Fundraising Sprint Devo workboard.
atgo added a subscriber: atgo.Via WebTue, Feb 17, 6:24 PM

@Krinkle @faidon now that we have the temporary fix, could we downgrade priority or close this?

faidon lowered the priority of this task from "Unbreak Now!" to "Normal".Via WebTue, Feb 17, 6:26 PM

Downgrading, but since this is as you said a temporary, this should probably stay open? As I understand it, it would still be a pretty big deal performance-wise when we run campaigns so it would be best to not put this on the backburner.

gerritbot added a comment.Via ConduitTue, Feb 17, 6:43 PM

Change 188395 abandoned by Ejegg:
Special:RecordImpression now sampled client-side

atgo added a comment.Via WebTue, Feb 17, 10:50 PM

Thanks @faidon. We're looking into longer term solutions... there's a spike for it here: T88614

AndyRussG added a comment.Via WebWed, Feb 18, 2:14 AM

I'm pretty sure the more permanent solution should involve:

  • A logging mechanism that has to be explicitly turned on for a given CentralNotice campaign, instead of being on for all campaigns or on by default.
  • Sending back more than just the name of the banner shown and info about client-side banner hiding. Instead, we'll store and send back the history of hide or show results from the last n (maybe 5 or so?) impressions. This will get us better analytics and banner-logic debugging than we have now.
  • Some kind of configurable sampling, with higher sample rates than the one we just set set for non-campaigns. (I dunno, maybe 20% rather than 1%? Not quite sure yet how it'll play out...)

NEway, I think there's no way out of sending some data from the client during campaigns. For these cases, performance-enhancing measures might include (in addition to sampling):

  1. Use a data transfer and storage mechanism that's easy on our servers.
  2. Time the actual call so we ease the load on clients (like maybe delay it for a few seconds to make sure all page rendering and other server communication has finished, or something like that).

No. 1 is the issue I'd love to have some more input on: what'll be the best way to send the data? Maybe EventLogging to Kafka (T68528)?

tl;dr: For the more permanent solution, sending some data from clients during campaigns is still inevitable. What do you recommend as the best way to do that?

atgo added a blocked task: T78089: Banner History.Via WebThu, Feb 26, 7:53 PM
atgo removed a blocked task: T78089: Banner History.Via WebThu, Feb 26, 11:37 PM
atgo added a blocking task: T78089: Banner History.
GOIII added a subscriber: GOIII.Via WebSat, Feb 28, 1:19 AM

With the so-called workaround in place, my F-12 Developer tool's console (Win 8.1/IE 11) still reports something similar to...

DOM7009: Unable to decode image at URL:

... on every page load or refresh. Searching for DOM7009 at Microsoft provides little to go on as well.

And fwiw... I must have dismissed the banner notifying us of the Steward elections 4 or 5 times already in case it matters.

Add Comment

Column Prototype
This is a very early prototype of a persistent column. It is not expected to work yet, and leaving it open will activate other new features which will break things. Press "\" (backslash) on your keyboard to close it now.