Page MenuHomePhabricator

Remove the cuc_ip, cule_ip, and cupe_ip columns from the cu_changes, cu_log_event, and cu_private_event tables respectively as duplicated to the IP hex columns
Closed, ResolvedPublic

Description

The CheckUser extension currently stores both the IP address and the hexadecimal representation of the IP address in separate columns. Based on https://codesearch.wmcloud.org/search/?q=cuc_ip&files=&excludeFiles=&repos=, it seems that there are no situations where the cuc_ip_hex column could not be used instead after passing it through IPUtils::formatHex. Removing the ip_hex column cannot be done as it used to filter for results by IP addresses. That means removing the ip column is the better option to remove this duplication.

To do this, the actor_ip_time indexes will need to be updated to use the ip_hex column instead of the ip column. This should be okay, but may produce an increase in the size of the index depending on how much a column being NULL could affect the index size. However, I would expect that this would lead to a overall decrease in the size for the result tables.

Example

For example, the cu_changes currently looks like:

  ...
  cuc_ip VARCHAR(255) DEFAULT '',
  cuc_ip_hex VARCHAR(255) DEFAULT NULL,
  ...
  INDEX cuc_ip_hex_time (cuc_ip_hex, cuc_timestamp),
  INDEX cuc_xff_hex_time (cuc_xff_hex, cuc_timestamp),
  INDEX cuc_timestamp (cuc_timestamp),
  INDEX cuc_actor_ip_time (cuc_actor, cuc_ip, cuc_timestamp),
  PRIMARY KEY(cuc_id)
) /*$wgDBTableOptions*/;

This ticket would propose that it be changed to the following:

  ...
  cuc_ip_hex VARCHAR(255) DEFAULT NULL,
  ...
  INDEX cuc_ip_hex_time (cuc_ip_hex, cuc_timestamp),
  INDEX cuc_xff_hex_time (cuc_xff_hex, cuc_timestamp),
  INDEX cuc_timestamp (cuc_timestamp),
  INDEX cuc_actor_ip_hex_time (cuc_actor, cuc_ip_hex, cuc_timestamp),
  PRIMARY KEY(cuc_id)
) /*$wgDBTableOptions*/;

This change would also be applied to the cu_log_event and cu_private_event tables.

Related Objects

Event Timeline

Added DBA for initial feedback. This is not likely to be planned in the short term. We would likely wait until both T361139 and T324907 are completed to reduce the issues caused by parallel database changes.

Dreamy_Jazz updated the task description. (Show Details)
Dreamy_Jazz updated the task description. (Show Details)
ABran-WMF triaged this task as Medium priority.
ABran-WMF moved this task from Triage to Ready on the DBA board.
Ladsgroup edited projects, added Data-Persistence (work done); removed DBA.
Ladsgroup subscribed.

I reviewed this, no objection from our side. I'd guess it was added for readability which is not a good reason. Will appreciate making these tables slightly smaller.