Some preliminary analysis shows a small reduction in actor_signature entropy as a result of User-Agent deprecation. However, it's not necessarily true that this implies a small change in top line metrics. We use actor_signatures in our pageview and unique devices pipelines. They are computed using the first 200 characters of the user agent string as well as the IP and other details about each request. A large change in User-Agent entropy may cause a smaller change in actor_signature entropy, but have an outsized effect on something like automata detection.
One way to model the impact of the changes is to compare the existing output of our pipelines with a simulated User-Agent deprecation. Currently we do this:
get_actor_signature(ip, user_agent, accept_language, uri_host, uri_query, x_analytics_map) AS actor_signature
And we could instead do something like this:
get_actor_signature(ip, concat( user_agent_map['os_family'], '-', user_agent_map['browser_family'], '-', user_agent_map['browser_major'], '-', user_agent_map['wmf_app_version'] ), accept_language, uri_host, uri_query, x_analytics_map) AS actor_signature_after_change
We can use that as input into the rest of our pipelines, to estimate impact on top line metrics.
Remaining tasks
- project plan including:
- resourcing plan based on skills required and capacity building
- bounded scope
- cross-team work for instrumentation outlined
- and timeline