This is a proposal for preventing content shifts when banners are injected into pages (the "banner bump").
## Overview
Implement a banner service that would select and return a banner, or no banner, for every pageview.
In CentralNotice, insert in the base HTML an `<esi>` tag pointing to the banner service.
In Varnish, use the ESI feature to call the banner service and pass along the required inputs. HTML content returned by the banner service will be injected into the base HTML.
Client-side JavaScript will detect if a banner was injected and may set a cookie with client data for the banner service to read on the next pageview. Code for this will be mostly the same as existing client-side CentralNotice code. Data reporting would not change from the current system.
## End-user impact
For readers, CentralNotice campaign admins and data analysts, there would be essentially no change from the current system, except that the banner bump would be eliminated.
## Implementation notes
### Inputs
* Active CentralNotice campaign config, provided by CentralNotice via a MediaWiki API.
* Targeting data, provided by Varnish:
** hostname
** language
** country code
** regional subdivision code
** user agent string
** mobile or desktop skin
** logged-in status
* Client data, provided in cookies set in JavaScript on the previous pageview:
** campaings for which the client has reached the maximum number of impressions
** previous clicks on banner close buttons
** buckets by campaign
** current step in campaigns using banner sequence
** user preferences for campaign display
* Static config:
** Mapping between hostnames and CentralNotice projects.
* Banner content, provided by Special:BannerLoader.
(See notes on caching, below.)
### Output
HTML elements with banner content (or no banner) and data for Javascript post-processing.
### Selection logic
Part 1:
{F34983073}
Part 2:
{F34983077}
(Code: P22017, P22018. Diagrams generated with [[ https://plantuml.com/en/activity-diagram-beta | PlantUML ]].)
### Caching
CentralNotice campaign config could be cached in RAM by the banner service. Currently config sent to the client for banner selection is cached on the same schedule as ResourceLoader modules.
Banner content is specific to the user's language and the campaign the banner is being displayed for. (A single banner can be assigned to more than one campaign.) The banner service could cache content for frequently selected banner/language/campaign permutations, or it could always request banner content from Varnish (which should be tuned appropriately).
While the banner selection process can involve randomness, in many cases it does not. The banner service could cache decisions made deterministically, to speed up selection in such cases.
### Added restriction
One minor, new restriction is needed for CentralNotice campaign config: all campaigns must either (a) have only one bucket (which can have multiple banners targeting different devices and logged-in statuses), or (b) all banners, across all buckets, must target the same set of devices and logged-in statuses.
In practice, all CentralNotice campaigns already comply with this restriction.
### Logging and monitoring
What sort of logging would the banner service do? Maybe log a summary of results once a second?
What would real-time monitoring and alerts look like?
### Proof-of-concept
[[ https://gitlab.wikimedia.org/andyrussg/banner_service_poc/-/tree/main | Here ]] is a working proof-of-concept, using Varnish `<esi>` tags and a standalone Rust web service! (Note: instructions there are not yet complete. Pls ping if you'd like to try it out!)
## Questions
### Risks
What are the risks involved in this proposal, and how could they be mitigated?
| **Risk** | **Possible mitigation** |
| Banner service goes down or takes too long to respond. | In Varnish (if possible) set a short timeout for requests to the banner service. Inject a synthetic response if the connection is refused or if the timeout is exceeded. |
| Banner service consumes too much RAM or CPU. | Implement a kill switch and monitoring that automatically flips the switch if resource usage exceeds a specified level. |
| Security | Prevent Varnish from processing any additional ESI tags that an attacker might be able to reflect into the base HTML. Prevent ESI tags from causing requests to anywhere other than the banner service. |
| Limited capacity to maintain the service or respond to urgent bugs (if we use languages or tools that are not currently part of our stack) | Train multiple engineers in the languages and tools used to implement the banner service. |
### Durability
Could this solution potentially be permanent or semi-permanent? If so, or if not, why? For how long might it be acceptable to leave it in place?
### Rollout
What steps might be required for a rollout of this system? Tentatively, perhaps it could go something like this?
- Initial profiling (lab conditions)
- Initial rollout on one or two small wikis
- More profiling
- Second rollout stage (more wikis, including one medium-sized wiki)
- More profiling
- Third rollout stage (one large wiki)
- More profiling
- Full rollout to all wikis