Page MenuHomePhabricator

Add DP cookie for pageview filtering
Closed, ResolvedPublic

Description

Add Varnish functionality to add a client browser cookie that records which pages a browser has loaded in a given day and sets a include_pv=0 key on x-analytics when a page has already been viewed before or 10 unique pageviews are exceeded.

Background: https://meta.wikimedia.org/wiki/Differential_privacy/Active/Country-project-page/User_filtering

Technical requirements:

  • Pageviews to namespace 0 only
  • Domain-specific -- e.g., en.wikipedia is handled separately from fr.wikipedia
  • Cookie should always expire at midnight UTC
  • After 10 pageviews are reached, cookie with list should be cleared and all following pageviews excluded
  • For at least initial testing, include_pv key should be passed to x_analytics on all pageviews detected and set to 1 for any of the first 10 unique pageviews and 0 for everything else

Event Timeline

Change 824769 had a related patch set uploaded (by Isaac Johnson; author: Isaac Johnson):

[operations/puppet@production] Addition of Varnish logic for setting include_pv cookie on x-analytics and WMF-DP on client.

https://gerrit.wikimedia.org/r/824769

BCornwall triaged this task as Medium priority.Sep 8 2022, 4:04 PM

Change 857748 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] varnish: Generate a DP subkey daily

https://gerrit.wikimedia.org/r/857748

Hi @BBlack and @Vgutierrez - could you please provide an update or some guidance around your expected timeline for this? Please let us know if anything else is required on our end. Thanks!

Hi @BBlack and @Vgutierrez - could you please provide an update or some guidance around your expected timeline for this? Please let us know if anything else is required on our end. Thanks!

aiming to deploy it this week :)

Thank you so much for the quick reply. Exciting!!

Change 886000 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[labs/private@master] varnish: Provide a valid DP key

https://gerrit.wikimedia.org/r/886000

Change 886000 merged by Vgutierrez:

[labs/private@master] varnish: Provide a valid DP key

https://gerrit.wikimedia.org/r/886000

Change 857748 merged by Vgutierrez:

[operations/puppet@production] varnish: Generate a DP subkey daily

https://gerrit.wikimedia.org/r/857748

Change 886008 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] varnish: Fix python3-nacl dependency order issue

https://gerrit.wikimedia.org/r/886008

Change 886008 merged by Vgutierrez:

[operations/puppet@production] varnish: Fix python3-nacl dependency order issue

https://gerrit.wikimedia.org/r/886008

Initial sanity checks confirms that the daily key generated on two different hosts is the same:

vgutierrez@cumin1001:~$ sudo -i cumin 'cp[6015,6016].*' 'sha512sum /etc/varnish/dp.daily.key'
2 hosts will be targeted:
cp[6015-6016].drmrs.wmnet
OK to proceed on 2 hosts? Enter the number of affected hosts to confirm or "q" to quit: 2
===== NODE GROUP =====                                                                                                                                                            
(2) cp[6015-6016].drmrs.wmnet                                                                                                                                                     
----- OUTPUT of 'sha512sum /etc/varnish/dp.daily.key' -----                                                                                                                       
c68e7ec05dcd8e934378ea21754825de7308eccc0ac9002e131e67f54ec0d7caafb23e3b982b0bc645adb9ac495cbcbc85dbd00d4e0716e957b66da94fcbe74f  /etc/varnish/dp.daily.key

Mentioned in SAL (#wikimedia-operations) [2023-02-02T15:00:55Z] <vgutierrez> rolling restart of varnish in cache::text - T315676

Change 824769 merged by Vgutierrez:

[operations/puppet@production] varnish: support differential privacy

https://gerrit.wikimedia.org/r/824769

Change 886095 had a related patch set uploaded (by Vgutierrez; author: Vgutierrez):

[operations/puppet@production] Revert "varnish: support differential privacy"

https://gerrit.wikimedia.org/r/886095

Change 886095 merged by Vgutierrez:

[operations/puppet@production] Revert "varnish: support differential privacy"

https://gerrit.wikimedia.org/r/886095

Change 886337 had a related patch set uploaded (by Vgutierrez; author: Isaac Johnson):

[operations/puppet@production] varnish: support differential privacy

https://gerrit.wikimedia.org/r/886337

@Jcross @Htriedman we had some issues after merging the Differential Privacy CR this morning and I reverted it shortly after. https://gerrit.wikimedia.org/r/c/operations/puppet/+/886337 should address the detected issues and I'll try to get it reviewed today and merged on Monday

Change 886337 merged by Vgutierrez:

[operations/puppet@production] varnish: support differential privacy

https://gerrit.wikimedia.org/r/886337

@Jcross @Htriedman https://gerrit.wikimedia.org/r/886337 got merged a few minutes ago, initial tests in cp6016 look good:

vgutierrez@cp6016:~$ curl -v -o /dev/null "https://test.wikipedia.org/wiki/MyTemplate" 2>&1 |grep -i cookie
< vary: Accept-Encoding,Cookie,Authorization
< set-cookie: WMF-Last-Access=06-Feb-2023;Path=/;HttpOnly;secure;Expires=Fri, 10 Mar 2023 00:00:00 GMT
< set-cookie: WMF-Last-Access-Global=06-Feb-2023;Path=/;Domain=.wikipedia.org;HttpOnly;secure;Expires=Fri, 10 Mar 2023 00:00:00 GMT
< set-cookie: WMF-DP=b21;Path=/;HttpOnly;secure;Expires=Mon, 06 Feb 2023 00:00:00 GMT
< set-cookie: GeoIP=US:::37.75:-97.82:v4; Path=/; secure; Domain=.wikipedia.org

X-Analytics looks like this:

X-Analytics: ns=0;page_id=58748;include_pv=1;https=1;client_port=55159;nocookies=1

puppet should apply the CR across the text cluster in the next ~30 minutes

@Vgutierrez Would you consider this completed and ready to be closed?

@Isaac @Htriedman @Jcross could you confirm that this is working as expected and can be closed?

@Vgutierrez this feature has been working as expected, and this ticket can be closed!

Vgutierrez claimed this task.

@Htriedman awesome, Thanks for the prompt response.

DP has been deployed and running happily since February 6th, 2023.