Page MenuHomePhabricator

Banner history logger records incorrect status code following campaign fallback
Closed, ResolvedPublic3 Estimated Story Points

Description

If a campaign that has banner history enabled is fallen back from, and the campaign that is fallen back to shows a banner, the banner history log shows status code 4, 'banner_chosen', rather than status code 6, 'banner_shown'. In addition to fixing this, we should verify behaviour in other fallback scenarios.

Event Timeline

Restricted Application added a subscriber: Aklapper. ยท View Herald TranscriptJun 18 2020, 6:21 PM
DStrine removed the point value for this task.Jul 20 2020, 8:19 PM
DStrine triaged this task as Medium priority.Aug 4 2020, 9:02 PM

I just started looking at this but I've got to do some work to get my local CN back in order. Just adding some localsettings which I can get from the staging site I set up. I don't think I'll be able to finish today but I'm just hoping to see if I can see what's happening.

It is working for one fallback scenario but not another. @AndyRussG are looking into testing the scenarios now.

On the call with @AndyRussG yesterday, it appeared that both scenarios were not working correctly.

To duplicate:

  • Create two campaigns A & B.
  • Create two banners 1 & 2.
  • Assign banner 1 to campaign A, and banner 2 to campaign B.
  • turn on banner history logger for both campaigns
  • Turn on impression diet for campaign A, limit to 2 or 3 impressions
  • In one scenario, close banner 1, on reload, banner 2 should appear
  • Other scenario, see banner 1, reload until impression diet is reached, on next reload, banner 2 should appear
  • in storage tab of developer tools, look in localstorage for the site, there should be a central_notice_banner_history key, with an array
  • for banner 2, statusCode value in this array is supposed to be 6, but will appear as 4 currently.

Weirdly I just tested the two scenarios, and I only saw the bug in the case of banner close, but not in the case of max impressions reached.

Deleting this comment as I think I figured out what's going on.

@AndyRussG I'm seeing the val array in the banner_history storage object blow up with objects (as we would expect) and in some of the objects statusCode for fallback banner is 6 and in another one it's 4. If we were testing and looked in one object that had the correct statusCode, we'd conclude it's working, but if we looked in the one where the value is 4, we'd see the bug.

Change 621341 had a related patch set uploaded (by Mepps; owner: Mepps):
[mediawiki/extensions/CentralNotice@master] CampaignFallback: runPostBannerMixinHooks only if no campaign selected

https://gerrit.wikimedia.org/r/621341

To duplicate:

  • Create two campaigns A & B.
  • Create two banners 1 & 2.
  • Assign banner 1 to campaign A, and banner 2 to campaign B.
  • turn on banner history logger for both campaigns
  • Turn on impression diet for campaign A, limit to 2 or 3 impressions
  • In one scenario, close banner 1, on reload, banner 2 should appear
  • Other scenario, see banner 1, reload until impression diet is reached, on next reload, banner 2 should appear
  • in storage tab of developer tools, look in localstorage for the site, there should be a central_notice_banner_history key, with an array
  • for banner 2, statusCode value in this array is supposed to be 6, but will appear as 4 currently.

Thanks so much for digging in here!!!

So, while the setup you described is a valid way to test the banner history logger (since we do need it to work in any valid configuration, including the one described) it's not quite the same as the typical production config where we've seen the error.

To replicate that setup, two small changes to the above are needed: (1) one of the campaigns needs to have a higher priority than the other; and (2) banner history should be activated only for the campaign at the higher priority. (On production, normally only FR campaigns use banner history and those are the ones that run at a higher priority,)

So, looking at the patch submitted, I think you're very much on the right track! However, it seems like a few more details are needed.

In runMixinHooks() in index.js, we only run hooks associated with the campaign currently chosen. So, if the lower-priority campaign doesn't have banner history enabled, in the patch as currently submitted, the banner history logger won't run when that campaign is fallen back to.

So, we probably want to change the way post-banner mixin hooks are run, so that at the end of the banner and campaign selection process (whatever the result) they run for all campaigns that were chosen in any of the iterations of the fallback loop.

We'll also have to adapt the mixins that register hook handlers to be able to run multiple times, since they were written with the expectation that they could only run once per pageview. We should do this for both pre and post handlers.

Along the way, we might rename them, since pre-/post- "banner" no longer seems quite right.

Also, just an idea about why we're getting inconsistent results! The main banner history logic runs in a requestIdleCallback function. So, maybe sometimes it actually executes sooner and sometimes later, depending on what the browser decides.

(Finally, I'm adding some minor comments on the Gerrit change.)

Thanks again!!!

@AndyRussG I'm seeing the val array in the banner_history storage object blow up with objects (as we would expect) and in some of the objects statusCode for fallback banner is 6 and in another one it's 4. If we were testing and looked in one object that had the correct statusCode, we'd conclude it's working, but if we looked in the one where the value is 4, we'd see the bug.

Ooof yuck! Hrmmm sounds like that's what'll happen if both campaigns have banner history enabled...

We'll also have to adapt the mixins that register hook handlers to be able to run multiple times, since they were written with the expectation that they could only run once per pageview. We should do this for both pre and post handlers.

Just a few more details here... so, I don't think we'll need to change how the pre- hooks handlers are run; they can just be run one at a time, only when the campaign they're associated with is selected.

However, both pre- and post- handlers (which are all in the mixins) will have to take into account the possibility that they could be run multiple times for different campaigns during a single pageview, and that the campaign that they are run for is not necessarily the last one that will be selected for that pageview.

I think we may also want to make sure the handlers can know which campaign they're being run for. The order they're run in can be predictable, though, I imagine (same order in which campaigns are selected in fallback loop iterations).

Thanks again!!!

So, from what I understand @AndyRussG we will need to keep the code the way it is in my patch, but instead change the postBannerHandler function in ext.central_notice.bannerHistoryLoader.js to look all the campaigns chosen from. We already have the campaignStatuses array that has some of that data but will we need to feed more in? It doesn't look like it has all the data makeLogEntries() currently uses.

from @AndyRussG:

  • so the three issues are 1) post- hook handlers were being called at the wrong time (your patch fixes that); 2) post- handlers have to be called for all campaigns that were ever chosen even if they were fallen back from (this was happening before, but with the changes in your patch, it won't happen, so we have to fix that, and that's where fixes in state.js will come in); and 3) the post- handler in the BH
  • logger needs to be able to run more than once per pageview (that's probably the change you were thinking of)
  • and I guess a fourth issue is that we need to check over all the existing pre- and post- handlers for similar issues
  • (i.e. check that they can all handle running more than once per pageview)

Based on our call just now, I understood 2 things:

  • Code around index.js:L82 needs to be adapted to run mixins for every campaign that fallback has selected between--state.js:L366,L401. state.getCampaign() is only used on L82.
  • Only one log entry should be added per pageview for the campaign actually selected

Updated: I'm thinking that we can update state.data.mixins to scan across all campaigns run through in campaign fallback then index.js can use this instead of state.getCampaign().mixins.

Change 623314 had a related patch set uploaded (by AndyRussG; owner: AndyRussG):
[mediawiki/extensions/CentralNotice@master] WIP First attempt at fixing state and fallback

https://gerrit.wikimedia.org/r/623314

AndyRussG set the point value for this task to 4.Sep 1 2020, 7:27 AM

Change 623537 had a related patch set uploaded (by AndyRussG; owner: AndyRussG):
[mediawiki/extensions/CentralNotice@master] Add PostDisplayProcess hook, fix main banner history fallback issues

https://gerrit.wikimedia.org/r/623537

AndyRussG changed the point value for this task from 4 to 2.
AndyRussG changed the point value for this task from 2 to 3.

Hi! The patches for this and the related cleanup task (mainly T232235) are ready for review.

Here are the details of what was happening to produce the bug as described in the task:

  • The banner history post-banner handler was only being called for the higher-priority campaign with banner history enabled. This initial call only happened in the first iteration of the fallback loop, after that campaign failed.
  • However, that handler actually does most of its processing inside a mw.requestIdleCallback function. So, this tended to execute after the fallback loop had completed (and the lower-priority campaign had been selected) but before the HTTP request for the banner content had returned. So that's why we were getting the status code 4 in the banner history logs in LocalStorage (and in the data sent back via EventLogging).

The other issue mentioned here, that is, multiple entries being added to the log per pageview when multiple campaigns had banner history enabled, was just due to the handler running multiple times, as the fallback loop attempted all the campaigns using that feature.

The attached patches address both issues:

  • A new hook will now run after the entire campaign and banner selection/injection process has completed. The hook will run for all campaigns with enabled mixins that provide handlers for it. This will allow the banner history mixin to capture the final results of the process.
  • Banner history has been switched to use this new hook, and also includes a check to make sure it runs only once per pageview.

A bunch of other cleanup was also done along the way (see the commit message here).

More fallback-related issues were also discovered in work on this bug. They are described here: T261718: More minor bugs introduced with campaign fallback. The cleanup patch only fixes the first of the issues mentioned in that task, though it does clear the way for fixing all of them.

Here are the other tasks that the patches in review address (wholly or in part):

To keep track of stuff, I've made all fallback-followup tasks subtasks of this one: T124969: Can we allow fallback to alternative campaign if mixin would hide this one.

Thanks much!!!

Also, if anyone's interested in the rough notes from work on this, here's the etherpad.

Change 623314 merged by jenkins-bot:
[mediawiki/extensions/CentralNotice@master] Campaign fallback follow-up: refactor data handling and some methods

https://gerrit.wikimedia.org/r/623314

Change 623537 merged by jenkins-bot:
[mediawiki/extensions/CentralNotice@master] Add FinalizeChooseAndMaybeDisplay hook, fix main banner history fallback issues

https://gerrit.wikimedia.org/r/623537

Change 621341 abandoned by Mepps:
[mediawiki/extensions/CentralNotice@master] WIP CampaignFallback: runPostBannerMixinHooks once per iteration

Reason:

https://gerrit.wikimedia.org/r/621341