Page MenuHomePhabricator

Draft a proposal for multiblocks table schema(s) and get consensus
Open, NormalPublic2 Story Points

Description

Schema proposal based on T194697#4490343

CREATE TABLE block_target (
  bt_id int NOT NULL PRIMARY KEY AUTO_INCREMENT,
  bt_address tinyblob NOT NULL,
  bt_user int unsigned NOT NULL default 0,
  bt_auto bool NOT NULL default 0,
  bt_range_start tinyblob NOT NULL,
  bt_range_end tinyblob NOT NULL,
  bt_count int NOT NULL default 0
);

CREATE UNIQUE INDEX bt_address ON block_target (bt_address(255), bt_user, bt_auto);
CREATE INDEX bt_user ON block_target (bt_user);
CREATE INDEX bt_range ON block_target (bt_range_start(8), bt_range_end(8));

CREATE TABLE block_entry (
  be_id int NOT NULL PRIMARY KEY AUTO_INCREMENT,
  be_target int NOT NULL,
  be_by int unsigned NOT NULL default 0,
  be_by_text varchar(255) binary NOT NULL default '',
  be_by_actor bigint unsigned NOT NULL DEFAULT 0,
  be_reason varbinary(767) NOT NULL default '',
  be_reason_id bigint unsigned NOT NULL DEFAULT 0,
  be_timestamp binary(14) NOT NULL default '',
  be_anon_only bool NOT NULL default 0,
  be_create_account bool NOT NULL default 1,
  be_enable_autoblock bool NOT NULL default '1',
  be_expiry varbinary(14) NOT NULL default '',
  be_deleted bool NOT NULL default 0,
  be_block_email bool NOT NULL default 0,
  be_allow_usertalk bool NOT NULL default 0,
  be_parent_id int default NULL,
  be_sitewide bool NOT NULL default 1
);

CREATE INDEX be_timestamp ON block_entry (be_timestamp);
CREATE INDEX be_expiry ON block_entry (be_expiry);
CREATE INDEX be_parent_id ON block_entry (be_parent_id);

CREATE TABLE block_entry_restrictions (
  ber_be_id int NOT NULL,
  ber_type tinyint(1) NOT NULL,
  ber_value int NOT NULL,
  PRIMARY KEY (ber_be_id, ber_type, ber_value)
);

CREATE INDEX ber_type_value ON block_entry_restrictions (ber_type, ber_value);

This schema is effectively splitting the existing ipblocks into two tables block_target and block_entry and renaming ipblocks_restrictions to block_entry_restrictions. This will allow multiple block entries for the same target to coexist.

Event Timeline

dbarratt created this task.Aug 20 2018, 7:56 PM
dbarratt triaged this task as Normal priority.
Aklapper renamed this task from Draft a proposal for mutliblocks table schema(s) and get consensus to Draft a proposal for multiblocks table schema(s) and get consensus.Aug 20 2018, 10:01 PM
dbarratt updated the task description. (Show Details)Aug 20 2018, 10:44 PM

I merged the schema from T197144 into this proposal as that patch will have been merged by the time these changes are made.

dbarratt updated the task description. (Show Details)Aug 20 2018, 10:47 PM
dbarratt updated the task description. (Show Details)Aug 22 2018, 7:29 PM
dbarratt updated the task description. (Show Details)
dbarratt updated the task description. (Show Details)Aug 22 2018, 7:32 PM
dbarratt updated the task description. (Show Details)
dmaza updated the task description. (Show Details)Aug 22 2018, 7:33 PM
TBolliger set the point value for this task to 2.Aug 24 2018, 6:08 PM

There was some discussion about potential scaling of this table which warrants further research.

The Partial Blocks MVP will have an artificial limit of 10 pages per block which was arrived at by @TBolliger in an effort to limit the complexity of some UI elements including the block log. But, in the real-world, @SPoore tells us that users are blocked from broad topics like "Deserts" or "Outer Space." In those instances, the number of pages being blocked could be very large.

What we want to try to determine is something like:

  • How many blocks are in effect at any given time? This helps us get a sense of Block usage overall
  • How many of those sitewide blocks might become partial blocks? In other words, if given the new feature, how many admins would choose it over the sitewide block?
  • How many pages, on average, would a partial block generally contain? Since the feature is not yet available, this will largely be a guess based on anecdotal information.

If we can derive some numbers and apply a fudge/growth factor, we could potentially model the size of all these tables with a variable degree of accuracy.

I expect Partial blocks to replace some sitewide blocks and page protections, but I personally hypothesize that most will come from situations where neither occurs. We can still determine a reasonable number.

In T190328 we found 66k blocks/week. Length: 12% indefinite, 45% 6 months+, 32% 1 week to 6 months. Reason: 83% of all blocks on RUWP are proxy or vandalism, with 81% of all blocks on ENWP being proxy, vandalism, or spam. Rudimentary logic could predict ~10-20% of current sitewide blocks may benefit from a partial block. A (really) rough ballpark could be 6 - 13k partial blocks/week across all wikis.

As for the number of pages a partial block could contain — can we do some dumb calculations? e.g. 25% of blocks are for 1 page, 25% are for 10, 25% are for 100, 20% are for 1,000, and 5% are for 100,000? We assume most partial blocks will be of indefinite length.

Does this seem reasonable? Is this helpful?

TBolliger closed this task as Declined.Jan 30 2019, 10:04 PM

The development of this project would be a massive time and resource investment for minimal impact. Boldly closing.

tstarling reopened this task as Open.Feb 6 2019, 10:14 PM
tstarling removed a project: Anti-Harassment.

Reopening and untagging Anti-Harrassment per T202673#4933221 : a task should be left open if it is a good idea but there is no resourcing.