Page MenuHomePhabricator

Migrate ipblocks to actor table
Closed, DeclinedPublic

Description

Problem
The ipblocks table does not use the actor table. Like everywhere in MediaWiki, IP Address should be centralized onto the actor table.

This is somewhat challenging because the table not only stores users and IP addresses, but also IP ranges which are not in the actor table.

Proposed Solution
Deprecate (and later remove) the current target fields:

+----------------------+---------------------+------+-----+----------------+----------------+
| Field                | Type                | Null | Key | Default        | Extra          |
+----------------------+---------------------+------+-----+----------------+----------------+
| ipb_address          | tinyblob            | NO   | MUL | NULL           |                |
| ipb_user             | int(10) unsigned    | NO   | MUL | 0              |                |
+----------------------+---------------------+------+-----+----------------+----------------+

And add two fields, one as a foreign key to the actor table and another as an int for the range (if the block is a range):

+----------------------+---------------------+------+-----+----------------+----------------+
| Field                | Type                | Null | Key | Default        | Extra          |
+----------------------+---------------------+------+-----+----------------+----------------+
| ipb_actor            | bigint(20) unsigned | NO   | MUL | 0              |                |
| ipb_range            | int(3)              | YES  | MUL | NULL           |                |
+----------------------+---------------------+------+-----+----------------+----------------+

When a user submits a range block, an actor will be created for the IP address (if there isn't one already) and the int after the / will be inserted into the ipb_range field. These two fields together will have a UNIQUE key on them to ensure that only a single target has a block.

Event Timeline

The rationale as described doesn't seem sufficient, is there another reason for doing this? I think the definition of an actor as a user that has edited should be preserved. Range blocks are blocks for which the target is not an actor. Maybe single-IP blocks are usually blocks of actors, but not always, which seems to suggest that linking to the actor table would be awkward.

The rationale as described doesn't seem sufficient, is there another reason for doing this?

Perhaps I'm misunderstanding what the purpose of the actor table is. I was under the impression that it was a way to store IP addresses without duplication?

I think the definition of an actor as a user that has edited should be preserved.

I'm not sure I understand why that semantic meaning is important. Also, it's not necessarily true. From what I understand, an actor could be created from any action that would invoke a log entry. Regardless, if all revisions from an actor are deleted, should the actor also be deleted?

Range blocks are blocks for which the target is not an actor. Maybe single-IP blocks are usually blocks of actors, but not always, which seems to suggest that linking to the actor table would be awkward.

They are also actors that do not yet exist (and will be prevented from existing). I'm not sure I understand why it's awkward? I suppose I don't subscribe to a semantic meaning to the data stored in actor. Is this documented somewhere that it must represent this? If so, why?

tstarling added a subscriber: Anomie.

The rationale as described doesn't seem sufficient, is there another reason for doing this?

Perhaps I'm misunderstanding what the purpose of the actor table is. I was under the impression that it was a way to store IP addresses without duplication?

No, actors are a new abstraction in core combining the two kinds of user (anonymous and logged in) who can perform actions on the site. As before, both kinds of user are encapsulated by the User class, but now we have a table which represents these User instances by a unique ID. The actor table represents this actor concept, it is not a generic string compression table.

I think the definition of an actor as a user that has edited should be preserved.

I'm not sure I understand why that semantic meaning is important. Also, it's not necessarily true. From what I understand, an actor could be created from any action that would invoke a log entry. Regardless, if all revisions from an actor are deleted, should the actor also be deleted?

Revisions are never truly deleted.

They are also actors that do not yet exist (and will be prevented from existing). I'm not sure I understand why it's awkward? I suppose I don't subscribe to a semantic meaning to the data stored in actor. Is this documented somewhere that it must represent this? If so, why?

Conceptual clarity is an essential architectural principle. The meaning of rows in the actor table is documented in the relevant comment in tables.sql: "The "actor" table associates user names or IP addresses with integers for the benefit of other tables that need to refer to either logged-in or logged-out users." The key word which you are missing is "users". A block target is not (necessarily) a user.

I gather by your response that there is no other rationale. I discussed this task with @Anomie and he also could not see a good reason for implementing this. So, I'm declining it.

No. @Anomie introduced the actor table and I reviewed that code, so we are its maintainers.

dbarratt removed a project: Platform Engineering.

No. @Anomie introduced the actor table and I reviewed that code, so we are its maintainers.

Thank you for your contribution. I'm re-opening this and removing Platform Engineering.

I was under the impression that core did not have maintainers? Is that not correct?

@dbarratt: What is the reason that this task got reopened? Also, see https://www.mediawiki.org/wiki/Developers/Maintainers for maintainer info.

@dbarratt: What is the reason that this task got reopened? Also, see https://www.mediawiki.org/wiki/Developers/Maintainers for maintainer info.

I wasn't aware of that, thanks for the link!

@Aklapper Question for you... do you know where (document?) we give maintainers the authority to gatekeep changes? Or is this just implied in maintainer? Thanks!

@dbarratt, anyone can decline a task. I'm not going to engage in lawyering. I mentioned our involvement with the development of this code in the hopes that that might help you understand why I think we know it better than you do. If you want this to be merged, you will have to convince @Anomie and I, since a -2 from either of us would certainly prevent the relevant code from being merged. Leaving the task open might give you the mistaken impression that the code would be accepted if you were to write it. All I need in order to reconsider my position is a solid technical rationale. You are mistaken in believing that the actor table is a general store of IP addresses and that "everywhere in MediaWiki, IP Address should be centralized onto the actor table" as you put it in the task description. That's why I don't consider that to be a sufficient reason for making the proposed change, and why I think there needs to be an additional rationale.

If you make this an RFC, then it can be reopened for consideration by TechCom. In my opinion, it's highly likely that TechCom would decline the RFC unless you are able to provide a rationale. I don't know why you would want to go through that procedure when there is apparently no product or initiative being blocked by this.

@tstarling I appreciate your time, knowledge, and explanation. I do not believe a justification has to be made for this change as it, however, if you take a look at the task this one is mentioned on T227733 it might make more sense where this is coming from (although, you may not agree with the thought process, which is fine).

You're more than welcome to disagree with a task, and I am thankful that you're willing to express such disagreement. I also hope you understand that I was not satisfied with your answer to my question, on why maintaining the semantic meaning was important. Your answer felt circular. And I would really appreciate a better answer. I understand that you disagree that the semantic meaning should be changed, but I'm not sure why it's important to maintain it's current meaning.

I'm re-opening the task, because this task is part of a bigger puzzle (of thoughts). I don't think your reasoning is good enough to find a new piece to fit that puzzle. I hope we can have a productive discussion on finding solutions that we can compromise on.

@dbarratt - Another reason it may not make sense to store the ipblock IP addresses in the actor table is that, at some point, the actor table may no longer be used to store IP addresses due to IP masking. However, we will always need the ability to store IP addresses in the ipblocks table, regardless of IP masking. Thus the semantic difference could end up being a practical difference at some point in the future.

Anomie triaged this task as Lowest priority.Jul 18 2019, 3:14 PM
Anomie moved this task from Inbox to Tracking/Watching on the Platform Engineering board.

I agree with Tim that this should be declined as it doesn't fit the model that defines the actor table. Targets of range blocks aren't actors. I also disagree with the rationale of keeping this clearly rejected task open because it's part of some "bigger puzzle". But I'm not going to war over it.

From glancing at T227733, it seems you're thinking of using the actor table to implement the "masking".[1] For actual actors that's probably a decent idea,[2] but block targets aren't necessarily actors. I'm not sure IP block targets actually need to be masked in the first place, for the very reason that they're not actually users (instead we might need to add the ability to block masks directly as a separate "kind" of block).

[1]: Every time I see "masking IPs" I tend to confuse it with CIDR subnet masks.
[2]: Although I'd put the mask into actor_name and have the IP in a separate column, rather than vice versa as proposed on that task. For query performance we likely need to keep actor_name as the primary string identifier for the actor (whether registered or anon).

kaldari changed the visibility from "Public (No Login Required)" to "Subscribers".Jul 18 2019, 5:16 PM

OOO. Thanks @kaldari & @Anomie, that makes a lot more sense.

So... if blocks are on "actors" (users or IPs) and "groups of actors" (IP ranges) maybe it might be best to represent it that way? Maybe something like this?

+----------------------+---------------------+------+-----+----------------+----------------+
| Field                | Type                | Null | Key | Default        | Extra          |
+----------------------+---------------------+------+-----+----------------+----------------+
| ipb_actor            | bigint(20) unsigned | YES  | MUL | NULL           |                |
| ipb_range            | tinyblob            | YES  | MUL | NULL           |                |
+----------------------+---------------------+------+-----+----------------+----------------+

Where ipb_actor would be a reference to the actor and ipb_range would be the full range (like it is today)? i.e. 127.0.0.1/16 ? This way, a block can either be against an actor or a range?

@kaldari changed the visibility from "Public (No Login Required)" to "Subscribers".

Why did you do that? I don't see anything private here.

kaldari changed the visibility from "Subscribers" to "Public (No Login Required)".Aug 1 2019, 8:15 PM

Technical rationales have been offered by several people arguing to decline this task, so I'm boldly going to do it.