Page MenuHomePhabricator

Changes to the cuc_agent column in the cu_changes table
Closed, ResolvedPublic

Description

To normalise the cuc_agent column in the cu_changes table, the column will be removed and replaced with cuc_agent_id. This will contain an ID that references the cu_useragent table. The row in that table will contain the value previously held in cuc_agent as cuua_text.

This change is also happening to the cu_log_event and cu_private_event tables but, based on T327447, it looks like the Data-Engineering team does not use these tables.

There is no particular timeline for this change, as this is not a high priority change for the Trust and Safety Product Team.

The new cu_useragent table and cuc_agent_id column will be stable to read from once T361199 is complete. In T361205, the CheckUser code will stop writing to cuc_agent and then later in T361208 the column will be removed.

Event Timeline

Dreamy_Jazz updated the task description. (Show Details)
Dreamy_Jazz updated the task description. (Show Details)

Just to note that we are proceeding with this change now and it will be soon be possible to read from the new schema on WMF wikis

It is now possible to use the cuc_agent_id column and cu_useragent table to read the User Agent from the cu_changes table. We will soon stop writing to cuc_agent in T361206: Stop writing old for user agent schema migration on WMF wikis probably in a few weeks time.

GGoncalves-WMF renamed this task from FYI: Changes to the cuc_agent column in the cu_changes table to Changes to the cuc_agent column in the cu_changes table.Jan 29 2026, 11:08 AM

Update (from Slack): cuc_agent and cu_changes are no longer bring read in MediaWiki, and writing will tentatively stop on the week of Feb 9.

We seem to have 3 dependencies on that field (one is the replica's CREATE TABLE statement).

I'll quote PSI on the migration path:

The migration should involve a LEFT JOIN to cu_useragent on cuua_id = cuc_agent_id along with replacing cuc_agent references with cuua_text. Note that the left join is because purging of unused rows has the very rare chance for race conditions (i.e. the row is unused in cu_useragent but then as it's being purged a new cu_changes row is added that uses it). (...) In the MediaWiki interfaces we treat a missing row as the user having an empty User-Agent header, so you may want to convert cuua_text as null to an empty string

Change #1236347 had a related patch set uploaded (by Snwachukwu; author: Snwachukwu):

[analytics/refinery@master] Migrate cu_changes table to use cuua_text in new cu_usergent table.

https://gerrit.wikimedia.org/r/1236347

Change #1236347 merged by Snwachukwu:

[analytics/refinery@master] Migrate cu_changes table to use cuua_text in new cu_usergent table.

https://gerrit.wikimedia.org/r/1236347

@Snwachukwu is this task complete? We would like to progressively stop writing to cuc_agent this week

The patch is ready but not yet deployed. I would be deployed with our deployment train in Tuesday (tomorrow).

ALthough our sqoop jobs are run monthly, at the beginning, so we expect next run to be in march. Thus stopping write to cuc_agent shouldn't affect our jobs.

The patch is ready but not yet deployed. I would be deployed with our deployment train in Tuesday (tomorrow).

ALthough our sqoop jobs are run monthly, at the beginning, so we expect next run to be in march. Thus stopping write to cuc_agent shouldn't affect our jobs.

Thanks! I'm going to proceed to stop writing to cuc_agent based on this (starting with group0 wikis)