Page MenuHomePhabricator

Review Campaign Fallback
Open, Needs TriagePublic4 Story Points

Description

https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/CentralNotice/+/517931/

DoD: handle the multiple rounds of code review and feedback

Event Timeline

DStrine created this task.Jun 24 2019, 3:58 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 24 2019, 3:58 AM
DStrine updated the task description. (Show Details)Jun 24 2019, 8:08 PM
DStrine changed the point value for this task from 0 to 4.
mepps added a subscriber: mepps.Jul 5 2019, 2:14 PM

Notes from tech-talk review:

We like tests!
The Global variable isn't necessarily needed, or a good idea.
Good comments.
Why filteredChoices variable? It doesn't serve a purpose and may obscure that cn.choiceData is being manipulated, and should be named choiceData.

mepps added a subscriber: Ejegg.Jul 5 2019, 2:14 PM

@AndyRussG @Ejegg Above is what I remembered offhand today.

Ejegg added a comment.Jul 5 2019, 2:56 PM

I think the most important bit is to get the recordImpression calls right, so each page load only has one beacon request.

Besides that, Andy wanted the chooser to do more of the fallback logic and he and I were discussing whether the 'while true' loop could be clarified. He noted the last part always hits a break statement, and I thought it might be better with the loop test being whether an ever-diminishing filteredChoices was empty.

mepps added a comment.Jul 5 2019, 4:15 PM

Thanks @Ejegg! @AndyRussG anything to add? @DStrine I assume we don't have hours left to fix this. Should we be doing it ourselves?

Thanks @Ejegg! @AndyRussG anything to add? @DStrine I assume we don't have hours left to fix this. Should we be doing it ourselves?

Hi... I'm trying to think of more alternatives for how to improve the loop... But, yeah, I think that's everything we mentioned in the tech-talk, thanks!

In a bit I'll post on the Gerrit change what was noted here, and a detailed suggestion for the loop. Also, we still need to review the tests, and smoke test.

IIRC there were some extra hours included as padding specifically for improvements like this...

Thanks much!!!

Hi! I just posted the following on the Gerrit change (includes the above plus a few more thoughts):

Here are some initial review notes!
First of all, thanks so much for working on this!! Please don't be discouraged if there are lots of comments...
(A lot of comments are from group the discussion on this code--thanks, all, for the comments, too!!)

  • The global configuration variable is not needed. Fallback can be fully enabled for all campaigns immediately when this rolls out. Eventually we may wish to make it possible to block fallback on a campaign-by-campaign basis, but that's not essential for now.
  • The possibility of more than one call to the server to send back data per pageview will complicate data collection, and will also consume extra resources. Instead, we need to ensure just one call to recordImpression() ever happens. I think we also need to check with Fundraising Analytics folks about how data will be processed once campaigns are able to fall back. This is an important detail that we hadn't considered until now. (See also inline comments.) Thanks again!!!!!!!!!!
  • ext.centralNotice.display.js:308 Hmmm... I'll confess that I'm not convinced. I'm not sure that breaking this code up into smaller methods will make it easier to read, especially after the global config switch (and thus much of the added repetition) is gone. It's true that the method is pretty long, however, for now, none of the logic in it would ever be called from anywhere else. I'm sure I'm biased, but my feeling is that as is, the method tells the story that it's meant to, and does so pretty clearly. ;)
  • ext.centralNotice.display.js:317 Nice! (Though see below for some thoughts.)
  • ext.centralNotice.display.js:318 This does not copy the object, but rather just creates a local variable that references the same original object under a different name. This just obscures the fact that the original object (cn.choiceData) is being modified.
  • ext.centralNotice.display.js:336 It would be prettier if this, or something equivalent-ish, only happened when a campaign actually fails, and not on the first pass.
  • ext.centralNotice.display.js:340 How about this, instead: 1) Keep a list of failed campaigns inside state; and 2) pass the list of failed campaigns from state to chooser.chooseCampaign(), and let chooser just ignore any campaigns on that list? That would keep more of the choosing and filterig logic in chooser.
  • ext.centralNotice.display.js:485 This break statement and the end of the loop could be after line 468 instead.
  • ext.centralNotice.display.state.js:243 Cool! This is important.... However, I was thinking that since this should only ever happen when a campaign fails, and we need to store and send back information about campaigns that fail (see general comments), we might make this method failCampaign(). It could then add the currently chosen campaign to a list of failed campaigns, along with the status/reason for the campaign's failure. In that case, we could also keep the list of failed campaigns in state, and not as a local variable in reallyChooseAndMaybeDisplay(). However, I'm not sure of how the exact flow and state tracking might need to change to accomodate a single call to recordImpression().
  • ext.centralNotice.display.chooser.js:52 This comment is not correct... though perhaps it's our fault for not having clearer comments in the code explaining the reason for this behaviour. Basically, campaigns data sent from the server in choiceData already takes into account campaign end dates. If we received a stale campaign here, it means that the all of choiceData is out of date. So, we simply short-circuit the whole processes by having no campaigns available. (I guess instead we could do something different that would leave a clearer indication of what happened in the status, though... That might be better.)

Also putting here the comments I added in Gerrit about QUnit tests:

Thanks also for making tests! Let's revisit these after we've worked on the actual feature a bit more!!!! :)
(There is indeed some confusion in the tests... For example, as noted in inline comments, we don't want to allow any campaigns to be shown if we get stale choiceData. Also, there are already tests for geotargeting aspects of campaign selection. See ext.centralNotice.display.chooser.tests.js and AllocationFixtures.json. Nonetheless, the effort on these is much appreciated!!!)

mepps added a comment.Jul 10 2019, 3:17 PM

@AndyRussG @DStrine Maybe we should take this off the board until we get a new patch?

They have been able to turn around patches within a week. I feel like we'd just be pulling this back in before the end of the sprint.

Change 523095 had a related patch set uploaded (by AndyRussG; owner: AndyRussG):
[mediawiki/extensions/CentralNotice@master] Refactor check for choiceData staleness and add corresponding status

https://gerrit.wikimedia.org/r/523095

Here's what I just commented on Gerrit:

First of all, thanks much for finding the QUnit issue! Nice catch!!!
To clear up issues about choiceData staleness, I made a patch that separates out the check and provdes a new status: https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/CentralNotice/+/523095/ Perhaps you could rebase future versions of this feature onto that one?
I've been thinking about this some more, and I'd like to suggest some further changes to the approach. Apologies that there are some new suggestions that are a bit different from the previous ones, and also apologies if the previous suggestions were not as complete as they could have been.
Also, I have a proposal for recordImpresion() data. While we still need to talk over how FR analytics will work once campaign fallback is deployed, in the end, I think, our options for sending data back from the front-end are limited, and we can move forward with the cleanest solution we can think of.
(Upcoming discussions about fallback and FR analytics can then focus instead on how to best store and query the data, to ensure consistency and continued availability of essential data points. However, I think only systems further down the pipeline will be impacted by that discussion, and we can safely move forward with a new data structure for the client-side beacons.)
Proposal for the new recordImpression data structure:

  • Add a new data property, campaignStatuses, that will be an array of objects with these properties: statusCode, campaign and, if greater than 0, banner_count. There will be one such object in that array for all the campaigns that were tried, including the last one (successful or not).
  • For the time being, this property should be an array encoded as a JSON string. Later, when we switch fully to EventLogging instead of the custom beacon/impression, we can remove the serialization.
  • For legacy compatibility, keep the following existing properties on the root of the data object: campaignCategory, campaignCategoryUsesLegacy, bannerCanceledReason, campaign, reason, result, status, statusCode. They should always be up-to-date with the last campaign chosen and its status.

Note that a change will be required in https://meta.wikimedia.org/wiki/Schema:CentralNoticeImpression
Proposals for code:

  • The current PatchSet does not guarantee recordImpression() will be called if all campaigns fail. This needs to be fixed.
  • The recordImpression and impressionEvent sample rates can vary on a per-campaign basis. The final sample rate used should be the highest of any campaigns tried, including failed campaigns. (For example, if a FR campaign failed and a non-FR campaign is chosen, and the FR campaign uses a higher sample rate, then the higher sample rate should be used.) The logic for this can be put in state.js.
  • The JSON seralization of contents of campaignStatuses can take place in getDataCopy() in state.js. Maybe the parameter cleanForLogging could be renamed prepareForLogging.
  • The method resetState() in state.js should no longer be needed. Just make state reset or clear properties on the root of the data object (see above) if/as necessary when a campaign is set, in state.setCampaign(). Every time a campaign is set, a new entry should be added to campaignStatuses (automatically by state). Also, the status of the campaign we're currently trying should be updated as needed there, as well as in the properties on the root of the data object.
  • The main purpose of makeFilteredChoiceData() in chooser is to filter out campaigns that are not applicable to this pageview. Really, that logic only has to be run once. Let's maybe try this: make the method in chooser to create filteredChoiceData available on the internal API. index.js can call that method and store filteredChoiceData in state before even starting the fallback loop. Then filteredChoiceData can be passed from state into a slightly smaller chooser method that just chooses a campaign from the filteredChoiceData it receives (i.e., everything currently in chooseCampaign() after the call to makeFilteredChoiceData().
  • Instead of keeping a list of ignoredCampaigns in state and passing it to chooseCampaign() (sorry for this change!) just make state automatically remove any failed campaign from filteredChoiceData whenever cancelBanner() is called.
  • If it seems appropriate, maybe rename filteredChoiceData to availableCampaigns? (If you do so, be sure to rename it everywhere!)
  • In index.js, you might change the main while( true ) { } fallback loop to a do { } while( condition ) loop, and have the condition be a call to a method in state to ask if it any more campaigns are available to try. To make this work, you might indeed wish to break up reallyChooseAndMaybeDisplay() such that different function is called to perform the operations we perform once a campaign is successfully chosen and not fallen back from. This might also be the code structure you want to ensure recordImpression() is always called.
  • End the fallback loop before we actually try to choose a banner. So, the loop can close after index.js:397 (line refers to the latest PatchSet). The reason for this is that the only conditions in which we might not get a banner after that point are not desireable conditions for fallback to occur. (Existing postbanner hook handlers do not cancel banners. Though this is not a formally set-out rule, it could be. Perhaps at index.js:68 we could add a comment to this effect.)

It is important that the final code for this feature be as clean, logical, simple and understandable as possible. The process is not simple, but it can be broken down into simple steps. It should tell a story of how campaigns and banners are chosen, and should tell it in a way that makes it as easy as possible future developers of the code to understand.
Part of that is making sure that we follow the rules of interaction and separation of concners described at the top of index.js.
Again, just to emphasize, this code runs on almost every pageview across almost all wikis, many millions of times a day, on many millions of clients. And it injects unsanitized content into the DOM. So, we can't cut corners with it.
If you think any of these proposals are mistaken, please say don't hesitate to so! Also please don't hesitate to reach out with questions as you go! Thanks so much!!!!!

Just added another comment:

In index.js, you might change the main while( true ) { } fallback loop to a do { } while( condition ) loop, and have the condition be a call to a method in state to ask if any more campaigns are available to try.

Actually, on second thought, that might not work... Maybe what would work, though, is to keep a while( condition ) { } loop, and make the condition a call to such a method in state... Hmmm or maybe such a check for at least one campaign available should go in chooser, or maybe it's just simpler to do check the length of filteredChoiceData (or availableCampaings, if renamed) directly in reallyChooseAndMaybeDisplay()... Also, note that even in that case, chooser.chooseCampaign() would not be guaranteed to come back with a campaign, since the available campaign(s) might be throttled to not take all possible pageviews... Anyway, maybe outside the loop, at the end, check if a campaign really was chosen, and if it was, do the banner choosing stuff, and no matter what call recordImpression()... Just a thought, also not positive if it'd work or if it's the clearest approach...
Thanks again!!

Change 523095 merged by jenkins-bot:
[mediawiki/extensions/CentralNotice@master] Refactor check for choiceData staleness and add corresponding status

https://gerrit.wikimedia.org/r/523095

Just to note, there've been quite a few cycles of review and new patch sets for this... Just haven't copied everything over here. Please see the Gerrit change for details. Thanks!!