Page MenuHomePhabricator

CheckUser: Store x-is-browser, x-ja3n and x-ja4h CDN header values
Closed, ResolvedPublic

Description

Summary

The Wikimedia CDN edge layer sets an x-is-browser, x-ja3n, and x-ja4h request header containing a score indicating how likely a request is from a browser vs a script. Values above 80 suggest a browser; below 20 suggests a script. This value should be stored by CheckUser. In another task, we can figure out how to suface this information to Checkusers. See CDN/Backend_api for details.

Technical notes

Three storage approaches are considered:

  • New columns on CU tables (e.g. cuc_is_browser, cuc_ja3n, cuc_ja4h): Simple and fast to query. However, it may not store the raw value for these pieces of information and would be de-normalised if we did store the raw value.
  • Reusing cu_useragent_clienthints tables: The anti-spoofing guard in UserAgentClientHintsManager::insertClientHintValues rejects writes when mappings already exist for a reference ID, which would conflict with the JS API write path for browser-provided Client Hints. To work around this, we would need to update the handling to ignore these data points when checking if Client Hints had already been submitted. Additionally, we are adding non-Client Hints data to a table marked as storing Client Hints
  • New dedicated table (e.g. cu_request_headers): A key/value table with a mapping table, following the same pattern as cu_useragent_clienthints but for server-side CDN headers with no anti-spoofing constraint. One schema migration covers x-is-browser, x-ja3n, x-ja4h, and any future CDN headers. However, adding a new table to WMF wikis is likely a problem for #DBAs and the format of the table is likely to be the same as cu_useragent_clienthints within reason.
    • However, maybe this could be possible in the x1 cluster?

Acceptance criteria

  • The raw x-is-browser, x-ja3n and x-ja4h header value is stored when CheckUser records actions

Event Timeline

kostajh renamed this task from CheckUser: Display x-is-browser CDN header value in result rows to CheckUser: Store x-is-browser, x-ja3n and x-ja4h CDN header values.Mar 4 2026, 4:31 PM
kostajh updated the task description. (Show Details)
kostajh added a project: DBA.

No concerns on db side of things. I'd only suggest that if you're not planning to search for ja3n or any of the three, then put them in a "cuc_extra_data" json blob column to reduce the maintenance overhead. But if you want to allow CUs to look for specific ja3n or ja4h, then each need a dedicated column.

Change #1265400 had a related patch set uploaded (by Dreamy Jazz; author: Dreamy Jazz):

[mediawiki/extensions/CheckUser@master] [WIP] Collect X-Is-Browser, ja3n, and ja4h as Client Hints

https://gerrit.wikimedia.org/r/1265400

Change #1265579 had a related patch set uploaded (by Dreamy Jazz; author: Dreamy Jazz):

[mediawiki/extensions/CheckUser@master] UserAgentClientHintsManager: Don't get DB connections in constructor

https://gerrit.wikimedia.org/r/1265579

Change #1265597 had a related patch set uploaded (by Dreamy Jazz; author: Dreamy Jazz):

[mediawiki/extensions/CheckUser@master] Fix UserAgentClientHintsHandler::insertClientHintValues fatal params

https://gerrit.wikimedia.org/r/1265597

Change #1265597 merged by jenkins-bot:

[mediawiki/extensions/CheckUser@master] Fix UserAgentClientHintsHandler::insertClientHintValues fatal params

https://gerrit.wikimedia.org/r/1265597

Change #1265579 merged by jenkins-bot:

[mediawiki/extensions/CheckUser@master] UserAgentClientHintsManager: Don't get DB connections in constructor

https://gerrit.wikimedia.org/r/1265579

Change #1265400 merged by jenkins-bot:

[mediawiki/extensions/CheckUser@master] Collect x-is-browser, ja3n, and ja4h as Client Hints

https://gerrit.wikimedia.org/r/1265400

Change #1268611 had a related patch set uploaded (by Dreamy Jazz; author: Dreamy Jazz):

[mediawiki/extensions/CheckUser@master] ClientHints: Don't collect header only on null edit

https://gerrit.wikimedia.org/r/1268611

Change #1268612 had a related patch set uploaded (by Dreamy Jazz; author: Dreamy Jazz):

[mediawiki/extensions/CheckUser@wmf/1.46.0-wmf.23] ClientHints: Don't collect header only on null edit

https://gerrit.wikimedia.org/r/1268612

Change #1268611 merged by jenkins-bot:

[mediawiki/extensions/CheckUser@master] ClientHints: Don't collect header only on null edit

https://gerrit.wikimedia.org/r/1268611

Change #1268612 merged by jenkins-bot:

[mediawiki/extensions/CheckUser@wmf/1.46.0-wmf.23] ClientHints: Don't collect header only on null edit

https://gerrit.wikimedia.org/r/1268612

Mentioned in SAL (#wikimedia-operations) [2026-04-07T18:01:26Z] <dreamyjazz@deploy1003> Started scap sync-world: Backport for [[gerrit:1268612|ClientHints: Don't collect header only on null edit (T418989)]]

Mentioned in SAL (#wikimedia-operations) [2026-04-07T18:05:00Z] <dreamyjazz@deploy1003> dreamyjazz: Backport for [[gerrit:1268612|ClientHints: Don't collect header only on null edit (T418989)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.

Mentioned in SAL (#wikimedia-operations) [2026-04-07T18:13:41Z] <dreamyjazz@deploy1003> Finished scap sync-world: Backport for [[gerrit:1268612|ClientHints: Don't collect header only on null edit (T418989)]] (duration: 12m 14s)