Page MenuHomePhabricator

<Org-Wide Impact> Google Chrome User-Agent Deprecation Impact
Closed, ResolvedPublic

Description

Request Status: Ad-Hoc Request
Request Type: External Dependency Change

Request Title: Google Chrome User-Agent Deprecation

Request Documentation

Document TypeRequired?Document/Link
Related PHAB TicketsYesT242825: Deal with Google Chrome User-Agent deprecation
Product One PagerNo<add link here>
Product Requirements Document (PRD)No<add link here>
Product RoadmapNo<add link here>
Product Planning/Business CaseNo<add link here>
Product BriefNo<add link here>
Other LinksNohttps://github.com/WICG/ua-client-hints

Related Objects

StatusSubtypeAssignedTask
Resolved DAbad
Resolvedkostajh
ResolvedTchanders
DeclinedNone
DuplicateNone
DeclinedNone
Resolved brooke
Resolvedkostajh
DeclinedNone
ResolvedSeddon
Resolvedkostajh
Resolved EChetty
ResolvedJAllemandou
OpenNone
ResolvedVolker_E
ResolvedCatrope
ResolvedJAllemandou
Resolvedphuedx
ResolvedBUG REPORTphuedx
Resolvedmforns

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

This project impacts any team that collects browser/device data or works with such data. Anti-Harassment Tools is one of the teams impacted -- there are likely many more including Research, Data-Engineering, Product-Analytics etc.

Google deprecating user agent which is one attribute of user request that we use. Would need to migrate away from this.

DAbad renamed this task from Google Chrome User-Agent Deprecation to <Org-Wide Impact> Google Chrome User-Agent Deprecation Impact.Dec 16 2021, 4:35 PM
DAbad claimed this task.
DAbad triaged this task as High priority.
DAbad updated the task description. (Show Details)
DAbad set Due Date to Feb 28 2022, 5:00 AM.

2021-12-16 - Analytics Discussion

  • Attendees: Olja, Emil, Desiree, Danny
  • Expect that all of analytics will be impacted in some way or form
  • To assess impact work will be broken down as follows:
    • Metrics Platform: (1) Reviews what new method is and how it will work, (2) Assesses impact on instrumentation
    • Data Engineering - identify what would be potentially impacted within the Analytics stack
DAbad changed the task status from Open to In Progress.Dec 16 2021, 4:46 PM
DAbad moved this task from Backlog to Investigate on the Foundational Technology Requests board.

Here are some search results for user agent code in MediaWiki that could possibly impact frontend features. Skins and Extensions seem to touch a few teams, and could use an audit. Thanks!

MW impact:

  • Spoke with Cindy Cicalese - no obvious impact - one place in the core session code that it gets the user agent, but it does not seem to base any logic upon the result - check with Gergo Tisza
  • Gergo Tisza analysis - don't think the auth framework uses user agents in any way (though it logs them and checkusers and the Security team might use the data when investigating login- or signup-related abuse). In MediaWiki in general, it's used for legacy browser detection in various places. The most prominent is probably ResourceLoader which uses the UA to check whether to load JS at all for a given request, and whether to use local storage for the module cache.

Analytics Impact:

Currently being assessed by Data Engineering & Metrics Platform team. Current solutions being evaluated include:

  • Instrumenting Page views in a similar way that we currently are doing with Virtual Page Views
  • Replacing the User-Agent with the data we get from the User-Agent Clients Hints.

The primary difficulty with using User-Agent Client Hints is that some requests will not include them, meaning we would need to do some preflight checks before sending the event offs (Introducing latency.) However if we instrument - browsers that don't support JS would not be able to send back data (We still need to determine the proportion of our users using browsers that don't support JS). We are currently in the process of evaluating to determine which option would clause the least down stream effects.

Currently working on getting the Client Hints data to establish the relationship between the User-Agent and User-Agent Client Hints. This will allow us to determine the relationship, if any data loss will occur and find any field/value changes in the Client Hints Data. Ideally this just involves a simple change to the varnish configuration to send the new headers back to DE for analysis.

  • January 19, 2022 Steering Committee:**
  • update on progress
  • impact: Metrics Platform team will be shifting to focus on this effort
  • Olja D. - user hints implementation has 2 options pre-flight (latency penalty) or post (loss of data on first request)
  • Mark B. - would likely impact SLOs,
  • Kate C. - perhaps performance should be included
  • Greg G. - fundraising and advancement generally want to be informed

Action Items:

  • Emil & Olja submit to tech forum

January 25, 2022 Stewards/Tech Team Meetup: Anti-Harassment Tools and T&S
Q&A on userAgent Changes:

Is there a timeline?

  • The goal is to have experiments set up and running and pulling data by the end of this week. Evolving situation which will require feedback from the community and internal dependencies.

Are mobile devices models going to be requested?

  • It would depend on if we require high entropy headers. But yes, it is being included in evaluations. Whether we need to request additional info from mobile units and what the prioritization will be (data integrity, performance) etc.
      • High entropy headers = headers we would need to explicitly request. Low entropy are sent by default.
    • in India, there is a huge pool of users in small ranges, so mobile units is going to be the difference in saying person A is not person B. That information (IE last name) would be very good for CheckUser.
      • Frequently uses update history to judge if it’s the same user. This information is helpful to CheckUser.

In low entropy headers, what information in high-level thinking is coming from low entropy?

  • Basically looking at browser information and current version and OS and whether it’s on mobile and/other attached models. Some info in html as well.
    • Sounds like a lot of the info from userAgent. Concern is what we’re losing to high entropy and performance and the payoff.

How are we engaging with the community?

  • engaging with advocacy team and technical engagement, to develop comms for what is impacted that the community maintains (actively being worked on). They are proactively trying to give people a heads-up, looking at repos etc.
  • Ideally try to minimize impact of change, since we would try to be logging in similar ways to minimize changes. When and how we can make the changes so that it doesn’t affect as many. Looking at plans, but have started engaging folks.

@DAbad for posterity, can you please define MU, MR and what "IE last name" is (to avoid confusing with a human user's last name)?

@DAbad for posterity, can you please define MU, MR and what "IE last name" is (to avoid confusing with a human user's last name)?

This was used as an example (two humans with the same given name)

@DAbad for posterity, can you please define MU, MR and what "IE last name" is (to avoid confusing with a human user's last name)?

This was used as an example (two humans with the same given name)

Edited the note to add some clarity

A quick update here.

We have 3 new fields in the wmf.webrequest table: ch_ua , ch_ua_mobile and ch_ua_platform which come from the low entropy client hints. Some preliminary analysis suggests that this will not be able to replicate the information we get from the useragent and will need to continue experimenting with what data we can get from the client. This will be confirmed by the end of this week.

The next step will be to put in a plan to start capturing high entropy hints to determine if this will replicate the information we get from the current useragent. This solution (if selected as the way forward) will result in some dataloss which will need to be evaluated during the experimentation process.

If you all need a CU to test/provide feedback, I would be happy to volunteer!

Another update here.

We have put together a patch to collect the high entropy hints - Specifically Sec-CH-UA-Bitness, Sec-CH-UA-Full-Version-List, Sec-CH-UA-Full-Version, Sec-CH-UA-Platform, Sec-CH-UA, Sec-CH-UA-Arch, Sec-CH-UA-Platform-Version, Sec-CH-UA-Mobile, and Sec-CH-UA-Model (In addition to the low entropy hints) and add them to the web requests table. Once deployed we will evaluate the solution to determine if we can reach data equivalency and determine what the dataloss might be.

Quick update:

After some delay with testing and resourcing the Patch is ready to be deployed. Once landed we can report back on our findings.

Quick update:

Patch was finally deployed and we started collecting data as of April 5th in the wmf web requests table.
Goal now is the perform and validate some analysis over a weeks worth of data to estimate data loss and scope required data changes.

Following a discussion that happened here: https://phabricator.wikimedia.org/T257893#8573050 - This work has been paused until further notice.

Checking in on the status of this issue. @Mayakp.wiki detected a large spike in pageviews that were being tagged as automated but look pretty clearly like human traffic (see T310846#8809323). The cause seems to be that the implementation by Chrome of the more generic user-agent seems to finally be rolling out in a substantial way (timeline) and so is breaking at least the bot detection pipelines in pretty significant ways. It seems the UA hints were dropped as it wasn't clear that we should be using them or that they would be of much benefit. Likely worth revisiting this conversation or considering alternatives though.

@IsaacJ indeed, I believe the bot detection was one consequence we were not expecting, I think this raises the Analytics impact significantly.

UPDATE: A quick initial check showed us that while user agent string entropy is dropping fast, it does not significantly affect our custom actor signature entropy too much, as most of that is based on IP. We are still investigating.

@DAbad: Resetting Due Date set for this open task as it passed a while ago.

@DAbad should merge this task into T242825: Deal with Google Chrome User-Agent deprecation? AIUI they have the same scope, except that this one is tagged with Foundational Technology Requests. There is some conversation happening here that is not happening in T242825; it'd be nice to reduce fragmentation if it's not necessary.

I propose to close this and T242825: Deal with Google Chrome User-Agent deprecation. Related tasks can be tracked in Google-Chrome-User-Agent-Deprecation.

Marking this as resolved.