Page MenuHomePhabricator

Technical investigation into an experiment for using client hints in CheckUser [8H]
Closed, ResolvedPublic

Description

Introduction

Google Chrome has outlined plans to deprecate user agent (UA) strings to increase privacy. This will have implications on CheckUser, which currently stores UA strings to help fight vandalism. Using the alternative, client hints, to gather this data will have implications on Wikimedia's privacy practices. See T242825 for full details.

Investigation

The introduction of client hints in Chrome 84 is intended to allow for experimentation and feedback.

As an experiment, we'd like to try gathering client hint data temporarily on a small scale, so that checkusers can provide feedback on its usefulness.

This task is for a technical investigation into temporarily gathering client hint data for page creations, and displaying the data in Special:Investigate alongside the normal UA string.

The focus on page creations is for two reasons:

  • Page creations show up in the Special:Investigate Compare and Timeline tabs (along with edits)
  • There are relatively few page creations compared to edits, so the data would be asked for on fewer requests

This should be considered a temporary experiment because:

  • Client hint data might become unavailable or be refused
  • The structure of client hint data might change
  • It is unclear when the UA string will be deprecated
  • We don't yet know how sites' gathering of client data will be surfaced, and how that will be viewed from a privacy perspective

Event Timeline

ARamirez_WMF renamed this task from Technical investigation into an experiment for using client hints in CheckUser to Technical investigation into an experiment for using client hints in CheckUser [8H].Jul 23 2020, 3:59 PM

@Niharika How does focusing on these three areas sound?

  • A technical outline of how we could do this experiment
  • An estimate of the proportion of cu_changes rows that would be affected (out of all rows, and out of page creation rows)
  • A list of questions to ask checkusers about this experiment

@Niharika How does focusing on these three areas sound?

  • A technical outline of how we could do this experiment
  • An estimate of the proportion of cu_changes rows that would be affected (out of all rows, and out of page creation rows)
  • A list of questions to ask checkusers about this experiment

This sounds good. I would also add another thing: how does the client-hint data differ from the user-agent data?
This will:

  1. Help us make sure we aren't missing anything important by capturing client-hints for certain requests instead of user-agents
  2. Help us better convey the change to users when we make design changes to show client-hints in the UI in place of user-agents

Change 617116 had a related patch set uploaded (by Tchanders; owner: Tchanders):
[mediawiki/extensions/CheckUser@master] WIP Demonstration of how to store client hints for experiment

https://gerrit.wikimedia.org/r/617116

Technical outline

The patch above proposes how we could implement this experiment. From the commit message:

A response header asks for client hint information for edit requests on non-existent titles (i.e. page creation).
The client hint information is stored instead of the user agent string where available, if the type of change is RC_NEW.
A feature flag controls where the experiment is enabled.

A user who sends client hints would now have different data in cuc_agent for their page creations. This would allow comparisons of the user agent string and client hints, but might affect comparisons for sockpuppet detection in CheckUser. Showing both sets of data would turn this into a much bigger project: we'd need new designs and a schema change.

How many rows could be affected?

Chrome84 will only send client hint data if the user has enabled Experimental Web Platform features (disabled by default). The number of users who both have this enabled, and are ever checked by CheckUser might be very small. It might even be too small at this stage to learn anything useful, if we were to run this experiment now.

This experiment wouldn't apply to pages created via the API, or via any other way that didn't involve going to an edit page.

In terms of the frequency of page creations in general, the type breakdown on enwiki for the last 90 days was roughly:

TypeFrequency
RC_EDIT82%
RC_NEW4%
RC_LOG14%

Differences between client hint data and user agent string

The information available via client hints is described here: https://wicg.github.io/ua-client-hints/#http-ua-hints

Testing locally on Chrome 84, @dbarratt and I each noticed that Chrome was responding with empty string for some of these values, without us having done anything special.

A few more points from the engineering meeting

Chrome 84 is already sending incorrect/missing client hint data, e.g.:

  • sec-ch-ua: "\\Not\"A;Brand";v="99", "Chromium";v="84", "Google Chrome";v="84"
  • sec-ch-ua-model: ""
  • sec-ch-ua-platform-version: ""

Client hints are disabled by default on Chrome 84. There seems to be a plan to enable them by default on Chrome 85: https://chromestatus.com/feature/5995832180473856 - checked on Chrome 85 beta and they haven't been enabled yet.

The cuc_type breakdown differs a bit on different wikis (wikidatawiki also had a very small number of RC_FLOW changes):

Typeenwikicommonswikiwikidatawiki
RC_EDIT82%82%91.5%
RC_NEW4%2%6%
RC_LOG14%16%2.5%

Change 617116 abandoned by Tchanders:

[mediawiki/extensions/CheckUser@master] WIP Demonstration of how to store client hints for experiment

Reason:

https://gerrit.wikimedia.org/r/617116