Page MenuHomePhabricator

Create separate tables for log events in CheckUser
Open, HighPublicFeature

Description

To implement T253796, table(s) should be created that allow storing log-specific information in a seperate table. When CheckUser queries are made, the tables will be joined so that results can continue to be shown.

This involves creating two new tables:

  • cu_log_event
  • cu_private_event

The cu_log_event table will contain entries that at the moment go into cu_changes that also:

  • Are log actions
  • Have a associated log ID in the logging table

The cu_private_event will contain entries that at the moment go into cu_changes that also:

  • Are log actions
  • Do not have an associated log ID (and thus are very unlikely to be shown publicly or be linked to)

The two tables are needed because when we have a log ID to reference storing the log parameters and other data that is not needed for indexing purposes can be gained using a join on the log ID to the logging table. Entries which do not have a associated log ID would then have to store this data, but to ensure that a new table is not polymorphic two tables are needed so that some entries have this column not be used.

Examples:

  • Log for a page move would go into the cu_log_event
  • Edit to a page would go into cu_changes
  • Login event would go into cu_private_event
  • Log events previously stored in cu_changes would be moved to cu_private_event

Doing this allows the solving of several security tickets and solving / making substantial progress towards solving:

This method was suggested by Ladsgroup to reduce the polymorphic nature of cu_changes over just adding cuc_log_id as suggested in the parent task. This would solve the parent task by removing log entries from cu_changes.

Todo (mostly in order but some can be done before others as desired):

  • Create the new tables
    • Create the schemas for these tables
    • Deploy the new tables to WMF production
  • Update code that writes and deletes to cu_changes to also use the new tables based on a schema comptability config (write new support)
    • The many methods that insert, delete and change cu_changes in Hooks.php - gerrit 876360
    • PopulateCheckUserTable maintenance script - gerrit 877182
    • PurgeOldData maintenance script - gerrit 877182
  • Add combined read and write new support - T341586
  • Add read new support so that the code that queries CheckUser reads from all three tables:
    • CheckUserQueryBuilder - gerrit 881420
    • LogFormatter-like class for cu_private_event rows (T341688)
    • CheckUser
      • Get edits pager
      • Get IPs pager
      • Get users pager
    • CheckUser API (T341827)
    • Investigate (T329189)
    • Temporary account IP reveal
  • Create a maintenance script to move log entries from cu_changes to cu_private_event, while keeping the moved entries in cu_changes with cuc_only_for_read_old set to 1 - gerrit 879919 (merged) + gerrit 886482
  • Enable write new (T330158)
    • Change the value in extension.json
    • Do this on WMF wikis
      • Double check the new tables are not included any DB dumps - Ladsgroup probably the one to contact for this
  • Move log entries in cu_changes to cu_private_event (first maintenance script)
    • Add this to update.php while ensuring that it is run before any column removal but after the creation of cu_private_event and cu_log_event
    • Have this maintenance script run on WMF wikis - No longer needed as data on WMF wikis has been purged that needed the move
  • Enable read new
    • Change the value in extension.json
    • Do this on WMF wikis
  • Stop writing old
    • In extension.json - T366505
    • Have this changed on WMF wikis - T360685
  • Delete cu_changes rows with cuc_only_for_read_old set to 1 - T341830
    • Create a maintenance script for this
    • Add this maintenance script to update.php while ensuring it it run before column removal but after the moving of log entries
    • Run this maintenance script on WMF wikis
      • Run optimise cu_changes on specific wikis, such as loginwiki due to the large drop in the number of rows(?). On enwiki it should be a ~20% drop and on loginwiki it should be a 99.9% drop in row count
  • Remove wgEventTablesMigrationStage config, inferring the value as SCHEMA_COMPAT_NEW in places that it was used
  • Remove columns related only log types in cu_changes along with cuc_only_for_read_old
    • Remove these columns from WMF production - requires maintenance script to move log entries be run if not more than 3 months since write new was enabled.

Details

SubjectRepoBranchLines +/-
mediawiki/extensions/AbuseFiltermaster+2 -0
mediawiki/extensions/AbuseFiltermaster+90 -16
mediawiki/extensions/CheckUsermaster+3 -3
mediawiki/extensions/CheckUsermaster+228 -0
mediawiki/extensions/Flowmaster+8 -0
operations/mediawiki-configmaster+5 -0
mediawiki/extensions/CheckUsermaster+2 -2
mediawiki/extensions/CheckUsermaster+350 -1
mediawiki/extensions/CheckUsermaster+3 -2
mediawiki/extensions/CheckUsermaster+213 -28
mediawiki/extensions/CheckUsermaster+577 -215
mediawiki/extensions/CheckUsermaster+619 -28
mediawiki/extensions/AbuseFiltermaster+4 -1
mediawiki/extensions/CheckUsermaster+428 -48
mediawiki/extensions/CheckUsermaster+512 -0
Show related patches Customize query in gerrit

Related Objects

View Standalone Graph
This task is connected to more than 200 other tasks. Only direct parents and subtasks are shown here. Use View Standalone Graph to show more of the graph.
StatusSubtypeAssignedTask
ResolvedDreamy_Jazz
ResolvedDreamy_Jazz
OpenNone
ResolvedTchanders
OpenNone
Resolvedtstarling
OpenBUG REPORTNone
OpenNone
OpenNone
OpenFeatureDreamy_Jazz
ResolvedMilimetric
ResolvedDreamy_Jazz
ResolvedDreamy_Jazz
ResolvedDreamy_Jazz
ResolvedDreamy_Jazz
ResolvedDreamy_Jazz
ResolvedDreamy_Jazz
ResolvedPRODUCTION ERRORpmiazga
ResolvedDreamy_Jazz
DeclinedNone
DeclinedDreamy_Jazz
OpenNone
OpenDreamy_Jazz
OpenNone
OpenNone

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 910832 had a related patch set uploaded (by Dreamy Jazz; author: Dreamy Jazz):

[mediawiki/extensions/AbuseFilter@master] Add missing CheckUser hooks to extension.json which have handlers

https://gerrit.wikimedia.org/r/910832

Change 910832 merged by jenkins-bot:

[mediawiki/extensions/AbuseFilter@master] Add missing CheckUser hooks to extension.json which have handlers

https://gerrit.wikimedia.org/r/910832

@Milimetric: Keeping data engineering in the loop like for T233004. This task will likely affect your analysis in a larger way than T233004 is as it will split data stored in cu_changes into three tables (cu_changes, cu_private_event and cu_log_event).

Thank you, I don't think this affects us right now. But I appreciate the ping. I think the new User Agent changes are going to affect us though, so I appreciate you cc-ing me.

Change 886482 had a related patch set uploaded (by Dreamy Jazz; author: Dreamy Jazz):

[mediawiki/extensions/CheckUser@master] Add support for writing both new and old for event table migration

https://gerrit.wikimedia.org/r/886482