Page MenuHomePhabricator

Create cu_log_event and cu_private_event on WMF wikis
Closed, ResolvedPublic

Description

  1. CREATEs to run: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CheckUser/+/866684/10/schema/mysql/patch-cu_log_event-def.sql and https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CheckUser/+/866684/10/schema/mysql/patch-cu_private_event-def.sql
  2. Where to run those changes: all.dblist
  3. When to run those changes: Blocks patches for several security tickets (e.g. T316414), however, no need to rush.
  4. If the schema change is backwards compatible: Yes.
  5. If the schema change has been tested already on some of the test/beta wikis: beta cluster doesn't have checkuser
  6. if the data should be made available on the labs replicas and/or dumps: No, data is private.

Table creation specific Q&As:

  1. Should this table be replicated to wiki replicas?: No, as it contains private data.
  2. Will you be doing cross-joins with the wiki metadata?: A join to the logging table for entries in the cu_log_event will occur.
  3. Size of the table (number of rows expected): Unsure of exact numbers as I do not have raw access to cu_changes on WMF wikis. The two tables will have less entries than what cu_changes currently has (as these tables are being added to split up what cu_changes stores into multiple tables).
  4. Expected growth per year (number of rows): Like cu_changes, the data will be purged after a set expiry time (currently 3 months). Once the tables have existed for 3 months or if a maintenance script is run to fill the tables, the tables will then not grow or shrink by any large amount
  5. Expected amount of queries, both writes and reads (per minute, per hour...per day, any of those are ok): As I do not have access to this information, I cannot give specifics for this. However, what I can say is:
    1. Reads: Very similar to cu_changes as all three tables will in most cases be read together
    2. Writes: Smaller than the writes to cu_changes - dependent on how many log events are being triggered
  6. Examples of queries that will be using the table: TBC - Need to work on a patch that makes use of the new system. Queries are either to be separately run or combined via a UNION (there may be cases where only some of the three tables are queried):
    1. SELECT ... FROM [cu_changes, cu_private_event] WHERE ... LIMIT ...
    2. SELECT ... FROM cu_log_event INNER JOIN logging ON cue_log_id = log_id WHERE ... LIMIT ...

Creating new tables over adding a new column to cu_changes was suggested by Ladsgroup. The change has been merged, but open to discussion on improvements if they are needed.

Related Objects

StatusSubtypeAssignedTask
ResolvedFeatureDreamy_Jazz
OpenFeatureDreamy_Jazz
ResolvedSecurityZabe
ResolvedSecurityDreamy_Jazz
ResolvedSecurityDreamy_Jazz
ResolvedSecurityDreamy_Jazz
OpenBUG REPORTNone
ResolvedBUG REPORTDreamy_Jazz
ResolvedBUG REPORTDreamy_Jazz
ResolvedDreamy_Jazz
OpenNone
OpenNone
OpenNone
OpenFeatureDreamy_Jazz
DuplicateNone
OpenFeatureNone
ResolvedFeatureDreamy_Jazz
ResolvedFeatureDreamy_Jazz
ResolvedFeatureDreamy_Jazz
OpenFeatureNone
OpenFeatureNone
ResolvedBUG REPORTDreamy_Jazz
DeclinedFeatureNone
OpenFeatureNone
ResolvedGlaisher
OpenFeatureNone
ResolvedDreamy_Jazz
OpenFeatureDreamy_Jazz
OpenDreamy_Jazz
OpenNone
ResolvedFeatureDreamy_Jazz
ResolvedGlaisher
Resolved Niharika
ResolvedNone
ResolvedFeatureDreamy_Jazz
DeclinedNone
ResolvedFeatureDreamy_Jazz
ResolvedFeatureDreamy_Jazz
DuplicateNone
OpenNone
DuplicateNone
ResolvedDreamy_Jazz
ResolvedDreamy_Jazz
ResolvedDreamy_Jazz
OpenFeatureNone
ResolvedDreamy_Jazz
ResolvedBUG REPORTDreamy_Jazz
ResolvedBUG REPORTDreamy_Jazz
OpenNone
OpenFeatureNone
DeclinedFeatureNone
OpenFeatureNone
ResolvedDreamy_Jazz
Resolvedkostajh
OpenNone
OpenNone
Resolved tstarling
OpenNone
ResolvedTchanders
OpenBUG REPORTNone
OpenFeatureDreamy_Jazz
ResolvedLadsgroup

Event Timeline

Dreamy_Jazz updated the task description. (Show Details)
Dreamy_Jazz updated the task description. (Show Details)
Marostegui moved this task from Triage to Blocked external/Not db team on the DBA board.
Marostegui subscribed.

Thanks for the task.
DBAs do not create the tables. That's self service: https://wikitech.wikimedia.org/wiki/Schema_changes#What_is_not_a_schema_change

Can you please clarify this bit:

Should this table be replicated to wiki replicas?: Yes, like cu_changes is currently. Replicas should keep this table and it's data private.

cu_changes tables isn't replicated to our wikireplicas. It is a private table: https://github.com/wikimedia/puppet/blob/production/manifests/realm.pp#L192

Thanks for the task.
DBAs do not create the tables. That's self service: https://wikitech.wikimedia.org/wiki/Schema_changes#What_is_not_a_schema_change

Thanks for the explanation. I didn't realise there was a separate page about creating tables until I created the task. I added the extra questions, but did not read on to the deployment section as I assumed that was not for me.

Can you please clarify this bit:

Should this table be replicated to wiki replicas?: Yes, like cu_changes is currently. Replicas should keep this table and it's data private.

cu_changes tables isn't replicated to our wikireplicas. It is a private table: https://github.com/wikimedia/puppet/blob/production/manifests/realm.pp#L192

Hmm. Maybe I'm misunderstanding what "wikireplicas" are in this situation. I know from the PHP code that CheckUser uses a replica DB to read rows from cu_changes via:

$this->mDb = $loadBalancer->getConnection( DB_REPLICA );

I think there might be a misunderstanding on what "wiki replicas" are.
This might help: https://wikitech.wikimedia.org/wiki/Wiki_Replicas

I am going to go ahead and prepare a patch to add these tables to the private list (like cu_changes)

Change 874781 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] realm.pp: Add two new tables to private

https://gerrit.wikimedia.org/r/874781

Change 874782 had a related patch set uploaded (by Dreamy Jazz; author: Dreamy Jazz):

[mediawiki/extensions/WikimediaMaintenance@master] createExtensionTables: Add extension CheckUser

https://gerrit.wikimedia.org/r/874782

Please do not create the tables until we've given the green light. We need to merge our patches and restart a few services.

Okay. Was following the steps in the "Preparation" section.

Change 874781 merged by Marostegui:

[operations/puppet@production] realm.pp: Add two new tables to private

https://gerrit.wikimedia.org/r/874781

Mentioned in SAL (#wikimedia-operations) [2023-01-03T10:50:43Z] <marostegui> Restart codfw sanitarium masters T326105

Mentioned in SAL (#wikimedia-operations) [2023-01-03T10:53:53Z] <marostegui> Restart eqiad sanitarium T326105

I have restarted both pair of sanitarium hosts and the replication filter is now in place. You can create the tables at your own convenience.

Change 874782 merged by jenkins-bot:

[mediawiki/extensions/WikimediaMaintenance@master] createExtensionTables: Add extension CheckUser

https://gerrit.wikimedia.org/r/874782

I have restarted both pair of sanitarium hosts and the replication filter is now in place. You can create the tables at your own convenience.

Thanks. I will need to find someone to do this.

I have restarted both pair of sanitarium hosts and the replication filter is now in place. You can create the tables at your own convenience.

Thanks. I will need to find someone to do this.

You can do it during one of the deployment windows.

Thanks @Ladsgroup. I won't be around for today and most of tommorrow for the deployment windows if you need my input.

I've added creating these tables to the late backport today.

Since the files are not deployed yet to production (it's in wmf.18) deployers will have hard time to get it done, I'm on it. Hopefully done in ten minutes or so when I can figure out a good way to do without breaking stuff :P

Ladsgroup claimed this task.
Ladsgroup edited projects, added DBA; removed Data-Persistence (work done).

it's done. Before starting to write to it, I suggest double checking this doesn't end up in cloud replicas and even if you start, let's do one wiki first, double check for leakage a couple times (this is quite sensitive) and then move forward to the rest of wikis.

Thanks!

I'll follow the plan and I'll set a migration config in the change so that it can be per wiki.