Page MenuHomePhabricator

Investigate options for dropped CN EventLogging events for new pipeline
Open, Needs TriagePublic2 Estimated Story Points

Description

We are consistently losing 12-14% of events on the new CN pipeline. This may well be due to ad blocking. This would significantly degrade data we're getting, as compared to what is currently available with the old pipeline.

What options are available? We should eventually switch from DjangoBannerStats to FRUEC and some form of log writing that's more stable than filtering the entire site's web request stream. But I don't imagine losing so much data in exchange for that is a worthwhile tradeoff.

For details, see T236834 and T220627#5641946.

Event Timeline

The old pipeline is parsing logs of hits to /beacon/impression, sent via sendBeacon with a fallback of creating an img with that src if navigator.sendBeacon is false-y.
Does event-logging have a less robust fallback?
Or could it be the ordering of the events? I.e. adblockers deciding that one sendBeacon is acceptable but that the second should be blocked?

It would be a real bummer to have to port DjangoBannerStats to python3!

OK, I see that EventLogging uses the same img.src fallback as the old pipeline beacon-sender, but that EventLogging also will skip sending if navigator.doNotTrack or window.doNotTrack are set. Do we know if that's enough to explain the difference? And if we have a good idea of what percentage of users on each UA are setting that, can we store more UA information so people can compensate for the dip at the end of the pipeline when querying the db?

I think it's blocking on the URL path /beacon/event. See https://easylist.to/easylist/easyprivacy.txt and T220627#5638168.

I kinda hope the first option would be for someone to talk to whoever maintains that and any other similar lists. I don't know that people who use AdBlock really would intend to block the minimal data collection of a privacy-conscious, ad-free non-profit. Maybe there are other sites that use the same URL path for something else?

Barring that, maybe Analytics could enable a new URL path for EventLogging, and see if it sticks?

It would be a real bummer to have to port DjangoBannerStats to python3!

Hehehe that would be... somewhat unfortunate... Instead, we could just send EventLogging-like JSON on the same beacon/impression URL and use FRUEC to ingress it, I think. We'd miss the benefit of Analytics' infrastructure for filtering the firehose, but I think that'd be better than losing all the datas.