Page MenuHomePhabricator

Schema changes for `cu_changes` and `cu_log` table
Open, Stalled, NormalPublic

Description

Current CheckUser extension does not use actor table and comment table. And cu_logid column is missing ( required for migrate to use LogFormatter class ).

I'd like to propose following changes to CheckUser database.

ALTER TABLE /*_*/cu_changes
	ADD COLUMN cuc_logid int unsigned NOT NULL DEFAULT 0 AFTER cuc_private,
	ADD COLUMN cuc_actor bigint unsigned NOT NULL DEFAULT 0 AFTER cuc_user_text,
	ADD COLUMN cuc_comment_id bigint unsigned NOT NULL DEFAULT 0 AFTER cuc_comment;

ALTER TABLE /*_*/cu_log
	ADD COLUMN cul_actor bigint unsigned NOT NULL DEFAULT 0 AFTER cul_user_text,
	ADD COLUMN cul_reason_id bigint unsigned NOT NULL DEFAULT 0 AFTER cul_reason;
  1. ALTERs to run: above
  2. Where to run those changes: all.dblist
  3. When to run those changes: No time constraint.
  4. If the schema change is backwards compatible: Yes. The new columns and tables won't be used until a feature flag is enabled.
  5. If the schema change has been tested already on some of the test/beta wikis: beta cluster don't have checkuser ; testwiki : not yet
  6. if the data should be made available on the labs replicas and/or dumps: No. data is private. DO NOT PUBLISH.

Related tasks:

Details

Related Gerrit Patches:

Event Timeline

Rxy created this task.Sep 16 2019, 12:48 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 16 2019, 12:48 PM
Rxy updated the task description. (Show Details)Sep 16 2019, 12:51 PM

Do you foresee the need of having an index (or replacing the current ones) with those new columns?
Also subscribing @Anomie here as he's been working on the migrations mentioned.

Do you have a gerrit link with the patch where all these columns are included to link it on this task? We do not execute schema changes until the gerrit patch is merged. Also listing the gerrit link on the task makes it easier to see it, now and specially in the future (just in case).

Marostegui triaged this task as Normal priority.Sep 16 2019, 12:53 PM
Marostegui moved this task from Triage to In progress on the DBA board.
Reedy added a subscriber: Reedy.Sep 16 2019, 1:00 PM

OOI, how deterministic is two columns using AFTER cuc_private in the same schema change? Does it insert one after cuc_private, then then insert the second, pushing the first column further away from cuc_private?

Is that the intended format?

Rxy updated the task description. (Show Details)Sep 16 2019, 1:02 PM

OOI, how deterministic is two columns using AFTER cuc_private in the same schema change? Does it insert one after cuc_private, then then insert the second, pushing the first column further away from cuc_private?
Is that the intended format?

is copy and paste miss :

I updated description
ADD COLUMN cuc_comment_id bigint unsigned NOT NULL DEFAULT 0 AFTER cuc_private;
to
ADD COLUMN cuc_comment_id bigint unsigned NOT NULL DEFAULT 0 AFTER cuc_comment;

There is actually not cuc_private column from what I can see on enwiki.
Am I missing something?

root@cumin1001:~# mysql.py -hdb1089 enwiki -e "show create table cu_changes" | grep -i private
root@cumin1001:~#
Rxy added a comment.Sep 16 2019, 1:09 PM

There is actually not cuc_private column from what I can see on enwiki.
Am I missing something?

root@cumin1001:~# mysql.py -hdb1089 enwiki -e "show create table cu_changes" | grep -i private
root@cumin1001:~#

O_o I guess last db schema change patch does not applied WMF production environment IMHO...

rECHU /archives/patch-cu_changes_privatedata.sql

O_o I guess last db schema change patch does not applied WMF production environment IMHO...
rECHU /archives/patch-cu_changes_privatedata.sql

Apparently not: https://phabricator.wikimedia.org/rECHUede4937ffbd887530c01785a95be349236990ecb there is no ticket for that from what I can see.
And to prevent issues like that, that's why we request all the schema changes to follow this template: https://wikitech.wikimedia.org/wiki/Schema_changes#Workflow_of_a_schema_change although at the time things were different :-)
Hence my request on this ticket to also link a gerrit patch :-)

@Rxy If that column is merged but not in production, I guess it is no used as the change is from 2012, and that code can be cleaned up?

Reedy added a comment.Sep 16 2019, 1:16 PM

I guess it's not a feature we use with $wgCUPublicKey being "" by default.

So for the wikis without the column, we're not trying to read it, hence it not being noticed in production

Have to wonder if we'll ever want to do so ;)

Most of the newer wikis should have the column though

Most of the newer wikis should have the column though

Correct, the new ones do have it

root@cumin1001:~# mysql.py -hdb1075 atjwiki -e "show create table cu_changes\G" | grep -i private
  `cuc_private` mediumblob,
root@cumin1001:~# mysql.py -hdb1109 wikidatawiki -e "show create table cu_changes\G" | grep -i private
  `cuc_private` mediumblob,

The usual mess with half applied schema changes :(

Reedy added a comment.Sep 16 2019, 1:21 PM

The usual mess with half applied schema changes :(

Well, not quite. It's just any wikis that were created with the cu_changes.sql file *after* it was merged... And in the last 7 years, will be quite a few wikis at this point

It doesn't help that https://github.com/wikimedia/mediawiki-extensions-CheckUser/commit/ede4937ffbd887530c01785a95be349236990ecb has no reference to a task or anything, so it's not immediately obvious why or for whom this change was made. So I'm slightly reluctant to just strip it out without finding who/what/where/why etc, but at the same time, it's extra work (though, slightly less so if you're already making alters to those tables) for the DBAs, which may end up being reverted at some point

I can easily do an ADD COLUMN IF NOT EXISTS or DROP COLUMN IF EXISTS and create/drop that cuc_private column as part of these schema changes if we find out whether we really need it or not.

Do you foresee the need of having an index (or replacing the current ones) with those new columns?
Also subscribing @Anomie here as he's been working on the migrations mentioned.

There seems to be a fair bit of difference between the extension's schemas and the production databases. It might be useful to do a systematic comparison.

As far as the proposed change goes,

  • Looking at the existing indexes in the extension, I'd guess they'd want indexes on (cuc_actor, cuc_ip, cuc_timestamp) and (cul_actor, cul_timestamp).
    • I see on enwiki there's also an index KEY `cuc_user_time` (`cuc_user`,`cuc_timestamp`), which would want a corresponding index on (cuc_actor, cuc_timestamp).
  • They'd presumably also want to alter cul_user, cul_user_text, and cul_reason to have defaults (0, '', and '', respectively).
    • I note this is already the case on enwiki, but may not be everywhere.

Also, from the description,

  1. If the schema change has been tested already on some of the test/beta wikis: It appears to have been auto-deployed to Beta. Things seem to be working in some quick testing, and if anyone has complained about things breaking I haven't heard it.

Looks like someone copied from T174569 and didn't update everything. There's no merged patch here to have been auto-deployed or tested, nor does the enwiki DB on Beta Cluster show the changes proposed here.

Rxy updated the task description. (Show Details)Sep 16 2019, 2:11 PM
Rxy updated the task description. (Show Details)
Reedy updated the task description. (Show Details)Sep 16 2019, 7:17 PM
Reedy changed the task status from Open to Stalled.Sep 16 2019, 7:21 PM

Needs gerrit patch creating and merging before DBA's will action...

Change 537256 had a related patch set uploaded (by Rxy; owner: Rxy):
[mediawiki/extensions/CheckUser@master] Schema change: Add columns cuc_log_id and ActorMigrate, CommentStore related

https://gerrit.wikimedia.org/r/537256

fdans moved this task from Incoming to Data Quality on the Analytics board.Sep 30 2019, 4:09 PM