Page MenuHomePhabricator

Generate a list of junk CN cookies being sent by clients
Closed, ResolvedPublic

Description

Recoding this in phab so I don't forget:

16:43 < AndyRussG> bblack: hi! sorry for the bother... Was wondering if you know of any logs, even sampled, of the names of cookies we see in the wild?
16:44 < AndyRussG> I'd like to vaccum up old cookies that have been created by on-wiki CentralNotice banner JS... There's other ways to get this info, but if there's anything from prod,
                   that'd be an additional source to check with

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
ori claimed this task.
ori subscribed.

I captured about 20 minutes' worth of cookie names by running varnishlog on cp1066 (randomly-selected text cache in eqiad). I ran:

$ varnishlog -n frontend -c -i RxHeader -I Cookie | grep -Po '([\w\-]+)(?==)' | pv -rl > "cookies.$(date +%s)"

I can't share the whole list, because some key names appear to have been set by malware / adware and to include random-looking alphanumeric strings of unknown significance. There is also a lot of other misc junk that could not have plausibly been set by us.

Filtering for things that look plausibly CN-related, I end up with:

*-campaign (where * = 'enwiki', 'eswiki', etc.)
202015-wmch
202015-wmch-wait
20octubre2015
20octubre2015-wait
WAM_2015_11_progress_hide
WAM_2015_11_progress_hide-1
WAM_2015_11_progress_hide-1-wait
WAM_2015_11_progress_hide-wait
WMHU_1percent_2016a
WMHU_1percent_2016a-wait
bannercount_fundraiser_2016
bannercount_fundraiser_2016-wait
bannercount_fundraiser_Dec2015
bannercount_fundraiser_Dec2015-wait
centralnotice-frbanner-seen-fullscreen
centralnotice_bannercount_fr12
centralnotice_bannercount_fr12-wait
centralnotice_bannercount_fr14
centralnotice_bannercount_fr14-wait
centralnotice_bannercount_fr14ty
centralnotice_bannercount_fr14ty-wait
centralnotice_bannercount_fr15
centralnotice_bannercount_fr15-wait
centralnotice_bannercount_inspire2015
centralnotice_bannercount_inspire2016
centralnotice_bannercount_kiwixES2015
centralnotice_bannercount_sep2015publicpolicy
centralnotice_bannercount_storeMay2015
centralnotice_bannercount_wikimania14
centralnotice_bannercount_wikimania14-wait
centralnotice_bucket
centralnotice_buckets_by_campaign
centralnotice_hide_CEESpring_2016_UA
centralnotice_hide_FDCCommentsApril2016
centralnotice_hide_Genericmaintenancenotice
centralnotice_hide_Strategy2016Draft
centralnotice_hide_WikiConFR
centralnotice_hide_WikipediaToTheMoon
centralnotice_hide_codfwMaintenanceSwitchoverLoggedIN
centralnotice_hide_codfwMaintenanceSwitchoverLoggedOUT
centralnotice_hide_fundraising
centralnotice_only2times_tou
centralnotice_only2times_tou-wait
cn_wam_201511
cn_wam_201511-1
cn_wam_progress201511-1
cn_wam_progress201511-1-a
cx_campaign_newarticle_hide
hidegeonoticeBostonWP15
hidegeonoticeNotreDameINFeb2016
wam-banner-hide
wam-banner-hide-wait
wmch-fundraising-2015
wmch-fundraising-2015-wait
wmde-fundraising-2015
wmde-fundraising-2015-wait

Just noting here for posterity: since it sounds like we're potentially getting rid of cookies for future CN campaigns, perhaps after that transition is complete, we could wipe these out universally from varnish for several months. Basically regex-check the Cookie: for CN-like cookies in the request, and do a Set-Cookie mirroring back the ones it finds with expires=1970 for delete.

@BBlack, we have some client-side code that's ready to deploy to do this. Rather than a regex, it uses a list of junk cookies to check for and remove. Would it be feasible to the same thing server-side, instead? That would help send a bit less code and data to the client... Thx!!

Yes, we can help wipe these out at the Varnish layer, by unsetting blacklisted cookies we see. We've done that before for similar issues, and usually after leaving such code in place for a few months the bulk of them are gone. I'd much rather use a regex than a long list of explicit cookies names if we're taking that route, though.

@BBlack What are the advantages to using a regex rather than an explicit list? Here is the list so far. (See also T135090.) My concerns about a regex are that it could accidentally catch valid cookies, and that to get everything it'd have to be pretty complex, in any case. Thanks!!! :)

It would just be simpler, but we can do a list. The data we have in Cookies_to_remove is just from Ori's 20-minute sample on a single appserver, right? The true list of outstanding/outdated cookies could be much larger. Did they have any kind of reasonable expiry on them when they were set (1y?)?

Are we still setting new CentralNotice cookies, and do the new ones have reasonable expiry?

It would just be simpler, but we can do a list.

Fantastic, thx!!

The data we have in Cookies_to_remove is just from Ori's 20-minute sample on a single appserver, right? The true list of outstanding/outdated cookies could be much larger.

It comes from several sources, but that sample turned up quite a lot. Haven't quite finished reviewing that list (which should be just a first pass, I think).

Did they have any kind of reasonable expiry on them when they were set (1y?)?

Most if not all probably had a reasonable expiry... there might be exceptions.

Are we still setting new CentralNotice cookies

Yes, though now fewer than before. Most CN functions are using LocalStorage now, and only fall back to cookies under certain circumstances. The exception is hide cookies (created when a user donates or clicks on the "close" button), since they need to be created across domains. (The task for finding a way to improve that, so we can eventually LocalStorage-ize those, too, is T117433.)

do the new ones have reasonable expiry?

AFIK, yes!