Page MenuHomePhabricator

Create separate tables for log events in CheckUser
Open, HighPublicFeature

Description

To implement T253796, table(s) should be created that allow storing log-specific information in a seperate table. When CheckUser queries are made, the tables will be joined so that results can continue to be shown.

This involves creating two new tables:

  • cu_log_event
  • cu_private_event

The cu_log_event table will contain entries that at the moment go into cu_changes that also:

  • Are log actions
  • Have a associated log ID in the logging table

The cu_private_event will contain entries that at the moment go into cu_changes that also:

  • Are log actions
  • Do not have an associated log ID (and thus are very unlikely to be shown publicly or be linked to)

The two tables are needed because when we have a log ID to reference storing the log parameters and other data that is not needed for indexing purposes can be gained using a join on the log ID to the logging table. Entries which do not have a associated log ID would then have to store this data, but to ensure that a new table is not polymorphic two tables are needed so that some entries have this column not be used.

Examples:

  • Log for a page move would go into the cu_log_event
  • Edit to a page would go into cu_changes
  • Login event would go into cu_private_event
  • Log events previously stored in cu_changes would be moved to cu_private_event

Doing this allows the solving of several security tickets and solving / making substantial progress towards solving:

This method was suggested by Ladsgroup to reduce the polymorphic nature of cu_changes over just adding cuc_log_id as suggested in the parent task. This would solve the parent task by removing log entries from cu_changes.

Todo (mostly in order but some can be done before others as desired):

  • Create the new tables
    • Create the schemas for these tables
    • Deploy the new tables to WMF production
  • Update code that writes and deletes to cu_changes to also use the new tables based on a schema comptability config (write new support)
    • The many methods that insert, delete and change cu_changes in Hooks.php - gerrit 876360
    • PopulateCheckUserTable maintenance script - gerrit 877182
    • PurgeOldData maintenance script - gerrit 877182
  • Add combined read and write new support - T341586
  • Add read new support so that the code that queries CheckUser reads from all three tables:
    • CheckUserQueryBuilder - gerrit 881420
    • LogFormatter-like class for cu_private_event rows (T341688)
    • CheckUser
      • Get edits pager
      • Get IPs pager
      • Get users pager
    • CheckUser API (T341827)
    • Investigate (T329189)
    • Temporary account IP reveal
  • Create a maintenance script to move log entries from cu_changes to cu_private_event, while keeping the moved entries in cu_changes with cuc_only_for_read_old set to 1 - gerrit 879919 (merged) + gerrit 886482
  • Enable write new (T330158)
    • Change the value in extension.json
    • Do this on WMF wikis
      • Double check the new tables are not included any DB dumps - Ladsgroup probably the one to contact for this
  • Move log entries in cu_changes to cu_private_event (first maintenance script)
    • Add this to update.php while ensuring that it is run before any column removal but after the creation of cu_private_event and cu_log_event
    • Have this maintenance script run on WMF wikis - No longer needed as data on WMF wikis has been purged that needed the move
  • Enable read new
    • Change the value in extension.json
    • Do this on WMF wikis
  • Stop writing old
    • In extension.json
    • Have this changed on WMF wikis
  • Remove the write and read old specific code once changes are stable enough
  • Delete cu_changes rows with cuc_only_for_read_old set to 1
    • Create a maintenance script for this
    • Add this maintenance script to update.php while ensuring it it run before column removal but after the moving of log entries
  • Remove columns related only log types in cu_changes - can be done in a follow up task
    • Remove these columns from WMF production - requires maintenance script to move log entries be run if not more than 3 months since write new was enabled.

Details

SubjectRepoBranchLines +/-
mediawiki/extensions/AbuseFiltermaster+2 -0
mediawiki/extensions/AbuseFiltermaster+90 -16
mediawiki/extensions/CheckUsermaster+3 -3
mediawiki/extensions/CheckUsermaster+228 -0
mediawiki/extensions/Flowmaster+8 -0
operations/mediawiki-configmaster+5 -0
mediawiki/extensions/CheckUsermaster+2 -2
mediawiki/extensions/CheckUsermaster+350 -1
mediawiki/extensions/CheckUsermaster+3 -2
mediawiki/extensions/CheckUsermaster+213 -28
mediawiki/extensions/CheckUsermaster+577 -215
mediawiki/extensions/CheckUsermaster+619 -28
mediawiki/extensions/AbuseFiltermaster+4 -1
mediawiki/extensions/CheckUsermaster+428 -48
mediawiki/extensions/CheckUsermaster+512 -0
Show related patches Customize query in gerrit

Related Objects

View Standalone Graph
This task is connected to more than 200 other tasks. Only direct parents and subtasks are shown here. Use View Standalone Graph to show more of the graph.
StatusSubtypeAssignedTask
ResolvedDreamy_Jazz
ResolvedDreamy_Jazz
OpenNone
ResolvedTchanders
OpenNone
Resolved tstarling
OpenBUG REPORTNone
OpenFeatureDreamy_Jazz
ResolvedLadsgroup
ResolvedMilimetric
ResolvedDreamy_Jazz
ResolvedDreamy_Jazz
ResolvedDreamy_Jazz
ResolvedDreamy_Jazz
ResolvedDreamy_Jazz
ResolvedDreamy_Jazz
OpenDreamy_Jazz
OpenDreamy_Jazz
OpenDreamy_Jazz
ResolvedDreamy_Jazz
ResolvedDreamy_Jazz
ResolvedDreamy_Jazz
ResolvedDreamy_Jazz
ResolvedPRODUCTION ERRORpmiazga
OpenNone
ResolvedDreamy_Jazz
ResolvedDreamy_Jazz
OpenNone
DeclinedDreamy_Jazz

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

(probably don't need the actor and comment migration task as a parent as that is nearly solved + this task isn't really about that migration).

Change 886479 had a related patch set uploaded (by Dreamy Jazz; author: Dreamy Jazz):

[mediawiki/extensions/CheckUser@master] [WIP] Add MoveLogEntriesBackToCuChanges maintenance script

https://gerrit.wikimedia.org/r/886479

Change 886482 had a related patch set uploaded (by Dreamy Jazz; author: Dreamy Jazz):

[mediawiki/extensions/CheckUser@master] Add support for writing both new and old for event table migration

https://gerrit.wikimedia.org/r/886482

Change 886479 abandoned by Dreamy Jazz:

[mediawiki/extensions/CheckUser@master] [WIP] Add MoveLogEntriesBackToCuChanges maintenance script

Reason:

This script won't be needed now due to Iaf203693f632f9c4610a587d1db88934bbd37118

https://gerrit.wikimedia.org/r/886479

Change 886485 had a related patch set uploaded (by Dreamy Jazz; author: Dreamy Jazz):

[mediawiki/extensions/CheckUser@master] Fix action name for password reset email event

https://gerrit.wikimedia.org/r/886485

Aklapper renamed this task from Create seperate tables for log events in CheckUser to Create separate tables for log events in CheckUser.Feb 6 2023, 1:22 AM

Change 886485 merged by jenkins-bot:

[mediawiki/extensions/CheckUser@master] Fix action name for password reset email event

https://gerrit.wikimedia.org/r/886485

Change 910832 had a related patch set uploaded (by Dreamy Jazz; author: Dreamy Jazz):

[mediawiki/extensions/AbuseFilter@master] Add missing CheckUser hooks to extension.json which have handlers

https://gerrit.wikimedia.org/r/910832

Change 910832 merged by jenkins-bot:

[mediawiki/extensions/AbuseFilter@master] Add missing CheckUser hooks to extension.json which have handlers

https://gerrit.wikimedia.org/r/910832

@Milimetric: Keeping data engineering in the loop like for T233004. This task will likely affect your analysis in a larger way than T233004 is as it will split data stored in cu_changes into three tables (cu_changes, cu_private_event and cu_log_event).

Thank you, I don't think this affects us right now. But I appreciate the ping. I think the new User Agent changes are going to affect us though, so I appreciate you cc-ing me.

Change 886482 had a related patch set uploaded (by Dreamy Jazz; author: Dreamy Jazz):

[mediawiki/extensions/CheckUser@master] Add support for writing both new and old for event table migration

https://gerrit.wikimedia.org/r/886482