As part T257893: [EPIC] Support User-Agent Client Hints header in CheckUser, CheckUser needs tables to store Client Hint data. The implementation proposed in https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CheckUser/+/931875 adds two new tables, namely:
- cu_useragent_clienthints - A table that stores name and value combinations of Client Hint data.
- cu_useragent_clienthints_map - A many-to-many table between cu_useragent_clienthints and one of cu_changes, cu_private_event or cu_log_event.
Answers to the questions at https://wikitech.wikimedia.org/wiki/Creating_new_tables:
- Should this table be replicated to wiki replicas (does it not contain private data)? No, as private data is stored.
- Will you be doing cross-joins with the wiki metadata? There will be no CROSS JOINs in the SQL sense. If this means any JOIN then, JOINs for these tables will occur to the tables cu_useragent_clienthints, cu_useragent_clienthints, cu_changes, cu_private_event and cu_log_event. When generating results for Special:CheckUser and Special:Investigate a JOIN would be made to the logging and comment table.
- Size of the table (number of rows expected):
- cu_useragent_clienthints - Minimal as the table stores unique combinations of client hint name and value, with most being shared by browsers. Probably around 1000 rows. Local testing using current versions of Chrome and Edge gives 14 rows, but each version number of a browser adds a new row.
- cu_useragent_clienthints_map - Very large number of rows expected, but entries are to be removed when the entry they reference is deleted in cu_changes and the column size and count is kept deliberately small to address this. Local testing indicates around 10 rows per entry in cu_changes, cu_private_event or cu_log_event. As such, the row count at most should be roughly 10 times the number of rows in cu_changes on wikidata.
- Expected growth per year (number of rows): Initially there will be a large increase, but data will be deleted after 3 months as is done for cu_changes so no expected growth after 3 months of full deployment.
- Expected amount of queries, both writes and reads (per minute, per hour...per day, any of those are ok): At minimum writes will occur to at least one of the tables every time a entry is stored in cu_changes when the browser is Chromium based (i.e. no Firefox or Safari). Initially this would be at least for every successful edit, but is likely to expand to log actions including logging in and out.
- Examples of queries that will be using the table: (` character removed from queries for code highlighting reasons)
- cu_useragent_clienthints
- SELECT cu_useragent_clienthints_id FROM cu_useragent_clienthints WHERE uach_name = 'architecture' AND uach_value = 'x86' LIMIT 1
- INSERT INTO cu_useragent_clienthints (uach_name,uach_value) VALUES ('fullVersionList','Google Chrome 114.0.5735.199')
- cu_useragent_clienthints_map
- SELECT COUNT(*) AS rowcount FROM (SELECT 1 FROM cu_useragent_clienthints_map WHERE uachm_id_type = 0 AND uachm_reference_id = 353 AND (uachm_id IS NOT NULL) ) tmp_count
- INSERT INTO cu_useragent_clienthints_map (uachm_uach_id,uachm_id_type,uachm_reference_id) VALUES (109,0,'353')
- cu_useragent_clienthints
- The release plan for the feature (are there specific wikis you'd like to test first etc): T341110: Deploy client hints functionality (testwiki, four production wikis and then all wikis)