Page MenuHomePhabricator

Visits to Wikimedia properties should not be used for Google ad targeting (FLoC)
Closed, ResolvedPublic

Description

Google Chrome (and Chromium, Microsoft Edge, etc.) browsers now track which websites users visit, and group users into "cohorts" of people who visit similar sites, so that Google can serve them targeted advertising based on their browsing history. See https://blog.malwarebytes.com/cybercrime/privacy/2021/04/millions-of-chrome-users-quietly-added-to-googles-floc-pilot/ and https://spreadprivacy.com/block-floc-with-duckduckgo/.

Site owners must opt-out of this behavior if they don't want visits to their site to be tracked, by sending the HTTP response header Permissions-Policy: interest-cohort=(). See https://github.com/wicg/floc#opting-out-of-computation.

Allowing visits to be tracked via FLoC is a privacy risk to our users. Although the resulting cohort IDs are meant to group users anonymously, it is possible to de-anonymize visitors using them. See e.g. https://github.com/WICG/floc/issues/100.

Event Timeline

dpifke renamed this task from Block Chrome FLoC tracking by default for Wikipedia to Visits to Wikimedia properties should not be used for Google ad targeting (FLoC).Apr 15 2021, 3:35 PM
dpifke added a project: Traffic.
dpifke updated the task description. (Show Details)

Change 679866 had a related patch set uploaded (by Dave Pifke; author: Dave Pifke):

[operations/puppet@production] varnish: add anti-FLoC header to responses

https://gerrit.wikimedia.org/r/679866

Change 679877 had a related patch set uploaded (by BryanDavis; author: Bryan Davis):

[operations/puppet@production] toolforge: opt-out of Google FLoC tracking

https://gerrit.wikimedia.org/r/679877

Change 679878 had a related patch set uploaded (by BryanDavis; author: Bryan Davis):

[operations/puppet@production] cloud-vps: opt-out of Google FLoC tracking

https://gerrit.wikimedia.org/r/679878

Is the header needed at all?

https://github.com/WICG/floc/issues/45#issuecomment-781042491 says:

During the Origin Trial, the default for whether a page will be used for FLoC computation will be based on Chrome's existing infrastructure which detects pages that load ads-related resources. Our thinking here is that pages detected as including ads-related resources probably fetched something with an ads-related 3p cookie attached, which means it's reasonable to guess that the page visit contributes to some ads profile today.

Since the WMF doesn't serve ads (I don't think the fundraising banners count), I don't think WMF sites would be included, so the header would be just cruft (and extra bytes down the wire).

Disclosure: I work for Google, but I'm making this (and all other) comments in a personal capacity. I have no relevant insider knowledge and my observation is based purely on the public GitHub thread.

Change 679908 had a related patch set uploaded (by BryanDavis; author: Bryan Davis):

[wikimedia/toolhub@main] security: Add Google FloC out-out header to responses

https://gerrit.wikimedia.org/r/679908

Are the Google heuristics for "ads-related resources" public? Are changes to those heuristics (or to the FLoC "trial") announced in advance? Do we have the resources to monitor for changes to either in determining if this header is needed in the future?

If the answer to any of the above is no, I think it's worth 39 bytes of response payload to unambiguously signal that we don't want our users to be tracked. As a point of context, a fresh enwiki/Main_Page pageview is 1,057,815 bytes, so this is 0.0036% overhead. Assuming a 1,500 byte MTU, it does not affect the number of packets containing the initial response, so should have no effect on latency.

I would love if FLoC were to be made opt-in, or abandoned completely. If that should happen, reverting the patches proposed in this task is trivial.

I considered modifying the Varnish patch to not send the header to privacy-respecting user agents like Firefox, but Chrome is phasing out the User-Agent header, which makes doing so difficult at the traffic layer.

Are the Google heuristics for "ads-related resources" public? Are changes to those heuristics (or to the FLoC "trial") announced in advance? Do we have the resources to monitor for changes to either in determining if this header is needed in the future?

If the answer to any of the above is no, I think it's worth 39 bytes of response payload to unambiguously signal that we don't want our users to be tracked. As a point of context, a fresh enwiki/Main_Page pageview is 1,057,815 bytes, so this is 0.0036% overhead. Assuming a 1,500 byte MTU, it does not affect the number of packets containing the initial response, so should have no effect on latency.

I would love if FLoC were to be made opt-in, or abandoned completely. If that should happen, reverting the patches proposed in this task is trivial.

I think these are all fair points and I agree with your conclusions.

Change 679877 merged by Bstorm:

[operations/puppet@production] toolforge: opt-out of Google FLoC tracking

https://gerrit.wikimedia.org/r/679877

Change 679878 merged by Bstorm:

[operations/puppet@production] cloud-vps: opt-out of Google FLoC tracking

https://gerrit.wikimedia.org/r/679878

fgiunchedi triaged this task as Medium priority.Apr 16 2021, 8:45 AM

Is the header needed at all?

https://github.com/WICG/floc/issues/45#issuecomment-781042491 says:

During the Origin Trial, the default for whether a page will be used for FLoC computation will be based on Chrome's existing infrastructure which detects pages that load ads-related resources. Our thinking here is that pages detected as including ads-related resources probably fetched something with an ads-related 3p cookie attached, which means it's reasonable to guess that the page visit contributes to some ads profile today.

Since the WMF doesn't serve ads (I don't think the fundraising banners count), I don't think WMF sites would be included, so the header would be just cruft (and extra bytes down the wire).

Disclosure: I work for Google, but I'm making this (and all other) comments in a personal capacity. I have no relevant insider knowledge and my observation is based purely on the public GitHub thread.

As I said in a task I opened, we surely don't, but a) we load per-wiki js which might be changed and b) we also serve a lot of third-party software (think of gerrit or jenkins) so I would consider it a "better safe than sorry" measure.

Change 679908 merged by jenkins-bot:

[wikimedia/toolhub@main] security: Add Google FloC out-out header to responses

https://gerrit.wikimedia.org/r/679908

Header added to fundraising nginx templates and deployed.

[frack::puppet] 58ed92cf Add Permissions-Policy header (Google FLoC)

Change 679866 merged by BBlack:

[operations/puppet@production] varnish: add anti-FLoC header to responses

https://gerrit.wikimedia.org/r/679866

dpifke claimed this task.

Change 681085 had a related patch set uploaded (by Jbond; author: John Bond):

[operations/puppet@production] P:tlsproxy::envoy: Add support to opt out of FLoC

https://gerrit.wikimedia.org/r/681085

Change 681085 merged by Jbond:

[operations/puppet@production] P:tlsproxy::envoy: Add support to opt out of FLoC

https://gerrit.wikimedia.org/r/681085

Change 681130 had a related patch set uploaded (by Jbond; author: John Bond):

[operations/puppet@production] hiera - idp: opt out of FLoC

https://gerrit.wikimedia.org/r/681130

Change 681130 merged by Jbond:

[operations/puppet@production] hiera - idp: opt out of FLoC

https://gerrit.wikimedia.org/r/681130