Summary
The Wikimedia CDN edge layer sets an x-is-browser, x-ja3n, and x-ja4h request header containing a score indicating how likely a request is from a browser vs a script. Values above 80 suggest a browser; below 20 suggests a script. This value should be stored by CheckUser. In another task, we can figure out how to suface this information to Checkusers. See CDN/Backend_api for details.
Technical notes
Three storage approaches are considered:
- New columns on CU tables (e.g. cuc_is_browser, cuc_ja3n, cuc_ja4h): Simple and fast to query. However, it may not store the raw value for these pieces of information and would be de-normalised if we did store the raw value.
- Reusing cu_useragent_clienthints tables: The anti-spoofing guard in UserAgentClientHintsManager::insertClientHintValues rejects writes when mappings already exist for a reference ID, which would conflict with the JS API write path for browser-provided Client Hints. To work around this, we would need to update the handling to ignore these data points when checking if Client Hints had already been submitted. Additionally, we are adding non-Client Hints data to a table marked as storing Client Hints
- New dedicated table (e.g. cu_request_headers): A key/value table with a mapping table, following the same pattern as cu_useragent_clienthints but for server-side CDN headers with no anti-spoofing constraint. One schema migration covers x-is-browser, x-ja3n, x-ja4h, and any future CDN headers. However, adding a new table to WMF wikis is likely a problem for #DBAs and the format of the table is likely to be the same as cu_useragent_clienthints within reason.
- However, maybe this could be possible in the x1 cluster?
Acceptance criteria
- The raw x-is-browser, x-ja3n and x-ja4h header value is stored when CheckUser records actions