Page MenuHomePhabricator

Decide how to define action block IDs for extension actions
Open, Needs TriagePublic

Description

This task follows up on some discussions on https://gerrit.wikimedia.org/r/c/mediawiki/core/+/681005 about how to avoid conflicts in the integer values for blockable actions added by extensions (see comments on patch).

Problem

TL;DR We need a way of assigning integer IDs to blockable actions (as defined by the BlockActionInfo service), in such a way that the IDs for actions added by extensions don't collide.

How partial blocks are stored

Partial blocks against actions will be stored in the existing ipblocks_restrictions table, which has columns for the type of restriction (representing types page, namespace or action) and the value of the restriction (representing the particular page, namespace or action). The table looks like this:

MariaDB [mediawiki]> DESCRIBE ipblocks_restrictions;
+-----------+------------+------+-----+---------+-------+
| Field     | Type       | Null | Key | Default | Extra |
+-----------+------------+------+-----+---------+-------+
| ir_ipb_id | int(11)    | NO   | PRI | NULL    |       |
| ir_type   | tinyint(4) | NO   | PRI | NULL    |       |
| ir_value  | int(11)    | NO   | PRI | NULL    |       |
+-----------+------------+------+-----+---------+-------+
Actions aren't well-defined

Pages and namespaces have IDs, which are stored in ipblocks_restrictions.ir_value. Actions are very unstructured in MediaWiki. Some actions are respresented by a subclass of Action (recognised by the string returned by Action::getName() - e.g. 'history'); some are defined in the list PermissionManager::coreRights - e.g. 'sendemail'; some are simply strings that are passed around - e.g. 'create'. Actions don't have IDs, and there's no a definitive set of actions defined in one place.

Perhaps the most useful definition of an action, for the sake of partial action blocks, is:

a string that can get passed to PermissionManager::checkUserBlock as the $action argument

We need to define blockable actions

The BlockActionInfo service will define a set of blockable actions. These will be recognised by the string value of the action, but must be stored with an integer ID in ipblocks_restrictions. The BlockActionInfo service will map the string values to their IDs for blockable actions defined in MediaWiki core.

The problem is that extensions need to be able to define blockable actions too. So how do we ensure that their integer IDs don't collide?

Possible solutions

Table for mapping actions to IDs

Have a table that maps string actions to IDs. This is proposed in T194529#6210790 and discussed in https://gerrit.wikimedia.org/r/c/mediawiki/core/+/681005/3#message-899556e17feef8a8d0e8f40f8b5e4a7945099f96

As proposed above, the extensions need only register their string actions (which already must not collide), and core could handle assigning IDs. This is similar to what we do with content models and slot roles (using the content_models and slot_roles tables) and there is existing infrastructure to make it easy, specifically the NameTableStore and NameTableStoreFactory classes.

A clear advantage of this method is that IDs will never collide, without extensions needing to do anything special. I think this would clearly be the way to go if we were designing actions from scratch. But I think given the complexity of our Production setup, (multiple wikis, global features, etc) it could add some difficulties. None are insurmountable, but I think they could add maintenance burden.

  • Extension action IDs would get out of sync between different wikis, since there would be no enforcement to keep them consistent, and different wikis have different extensions installed. This would make it more difficult to compare blocks on different wikis: we'd need to be careful to join on the remote wiki's action block table whenever looking up a remote block, and moving data between wikis would need to be handled carefully.
  • If GlobalBlocking were ever updated to work with partial blocks we'd need to store the action string rather than the ID. (There would be a similar problem for pages of course, but global blocks against pages doesn't make much sense since pages differ across wikis. However, we could implement global blocks against namespaces quite easily, since the IDs are consistent.) It's conceivable that a global block against e.g. 'upload' could be a useful thing to implement someday.
  • The usual problem with storing extension data in a core table: if an extension were uninstalled, there could be nonsensical rows left behind in the block actions table unless we deleted them. (There could also be nonsensical blocks left behind, but this is a problem with either solution. It's also an existing problem with namespaces: T280798)
Hard-code the IDs and avoid conflicts via on-wiki documentation

This is how namespace IDs work: https://www.mediawiki.org/wiki/Extension_default_namespaces

Hard-coding is often a less satisfying solution, and having to manage the IDs on a wiki introduces some overhead which would be avoided by using the other solution. It also further entangles the code with the wiki.

The main advantage would be that it avoids the above complexity by keeping IDs consistent across wikis. In any relevant tables it would be clear what the ID meant. Managing the IDs would probably be a bit simpler than for namespaces, since the number of actions that can be partially blocked will likely be smaller.


We could go ahead with the second option while we're just testing with a handful of actions on beta, but we should decide which solution to implement before enabling on production sites.

Event Timeline

Aware that this proposal is essentially adding on another standard to our pile of standards, but could we add another column to the table like ir_action? And make ir_value something fixed or allowed to be null? ir_action would be the action string and we could check for it instead of ir_value whenever we have an action restriction (page and namespace restrictions would remain the same). This way we wouldn't need to manage ids or anything.

Hard-code the IDs and avoid conflicts via on-wiki documentation

If we do go with this approach, we should make a separate task for documenting this, which could cover:

  • providing a wiki page for extensions to document their blockable actions and IDs
  • mentioning the wiki page in the documentation for GetAllBlockActionsHook

Was about to file a ticket for this, but found this task. Added some notes

Thanks @DannyS712. From a cursory glance, NameTableStore looks promising, though I'm not familiar with it yet.

We'll go with option 2 ("Hard-code the IDs and avoid conflicts via on-wiki documentation") for now so we can move ahead with testing straight away, but it should be very possible to switch away from this to option 1 ("Table for mapping actions to IDs") in the future. I think the simplicity and flexibility of it option 2 is helpful while we're developing this - as is the ease with which we could switch to option 1.

Let's keep this task/discussion open, and In the meantime a proof-of-concept patch for option 1 might help with the discussion, if anyone has the inclination and capacity to make one.

Hello,

The cs.wikipedia deployment for Action Blocks was originally meant to happen today (Aug 11, 2022). Few days ago, I was informed by @Niharika that an extension was requested, until Aug 30, 2022 (so +14 days), to work on this task. Based on the task's description, it looks that finishing this task will likely require a database table in Wikimedia production (assuming the objective will be to "implement the decision", not only to "make the decision").

If that's the case, please note that creating a DB table is a process which requires coordination with other WMF teams (Data Persistence at least, others might be required based on cloud replication requirements). For more details, please see the procedure at Wikitech. Considering the need for coordination, I'm afraid this might actually take more than the requested 2 more weeks.

Tchanders added a subscriber: Urbanecm.

The cs.wikipedia deployment for Action Blocks was originally meant to happen today (Aug 11, 2022). Few days ago, I was informed by @Niharika that an extension was requested, until Aug 30, 2022 (so +14 days), to work on this task.

This is the first I've heard of either of these deadlines - thanks for highlighting this @Urbanecm_WMF.

There may be a few ways we can make the deadline. It may be more feasible to keep track on wiki, in which case there's nothing to change (though obviously I won't let a deadline influence our judgement here). Or it may be that we can enable this then fix it later - will require some data cleanup, but might be worth it. Or we might not make the deadline.

Either way this task is prioritised now, and I'll post some thoughts soon.

The cs.wikipedia deployment for Action Blocks was originally meant to happen today (Aug 11, 2022). Few days ago, I was informed by @Niharika that an extension was requested, until Aug 30, 2022 (so +14 days), to work on this task.

This is the first I've heard of either of these deadlines - thanks for highlighting this @Urbanecm_WMF.

No problem!

There may be a few ways we can make the deadline. It may be more feasible to keep track on wiki, in which case there's nothing to change (though obviously I won't let a deadline influence our judgement here). Or it may be that we can enable this then fix it later - will require some data cleanup, but might be worth it. Or we might not make the deadline.

I think that the clean-up will be needed in any way, as Action Blocks are already deployed to test.wikipedia. Testwiki (despite being here for tests) is a production cluster wiki, and if cleanup would be needed after deploying to cs.wikipedia, I think a similar clean-up will be needed for test.wiki. I'm not sure about the details of the cleanup, but the effort necessary to do it for a single wiki might be similar to effort needed for two wikis (in which case, deploy first, then fix this task might be the best decision).

Either way this task is prioritised now, and I'll post some thoughts soon.

Thanks! Let me know if I can help in some way.

Summary

The problem is laid out in the task description, along with some amount of investigation. I won't repeat either here.

What I've investigated here is further to what's in the task description. Includes a new solution, which I think is better than either.

As for deploying, doing anything but the status quo will take time to properly plan and implement, so we could deploy to cswiki first.

What can we learn from how namespaces are managed?

There have been many discussions about storing namespaces in a structured way, which generally summarise to:

  • This would be better
  • There's so much tech debt that it's difficult/risky (e.g. patches getting reverted)
  • Problems arising from not storing them have been solved by other improvements to MediaWiki
  • No-one has the resourcing to do something so risky for so little gain
  • Tasks: T487, T9803, T33063, T226667

Lesson
I think the main thing we can learn from this is that there's obviously a precedent for working via the wikis without really creating unsurmountable problems, but that it's unpopular for obvious reasons and can become irreversable if taken too far. There's no clear reason for namaging namespaces on the wikis, other than tech debt and lack of resourcing.

What can we learn from how content models etc are managed?

Content models (each page has a content type, e.g. wikitext, JSON, etc):

  • Clearly defined:
    • Defined using FooContent and FooContentHandler classes; extensions can register more
    • Identified by a string
    • The config $wgContentHandlers avoids naming clashes

Slot roles (Each page can be associated with multiple slots for content playing different roles, e.g. template code, template documentation):

  • Clearly defined:
    • SlotRoleRegistry defines which slot roles are available on which page
    • SlotRoleHandler instances delare the existence/behaviour of slot roles (e.g. which content model they use)

Blockable actions/rights:

  • An action that is blockable is a string that can get passed to PermissionManager::checkUserBlock as the $action argument (from the task description)
  • Not well defined, as explained in the task description (and recapped here). They can be "Actions", "Rights" or strings.
  • Actions:
    • Something that can be done to a page, defined in $wgActions (see ActionFactory::coreActions)
    • Some are just strings, some are associated with classes (which also define a string)
    • Can be registered by extensions
  • Rights:
    • Permissions assigned to user groups, defined in $wgAvailableRights (see PermissionManager::coreRights)
    • Strings
    • Can be registered by extensions
  • Strings just passed around, e.g. 'create':
    • Not in $wgActions, because creating a page uses action=edit
    • PermissionManager::checkActionPermissions changes the action to 'create' if the action is 'edit' and the page is new
    • If the action is 'create', PermissionManager checks 'createtalk' and 'createpage' rights
    • 'create' is really a private implementation of the PermissionManager
    • 'create' is currently blockable as an action, becuase action blocks are checked from within the PermissionManager

Lesson
Something to learn here is that NameTableStore is currently used for clearly defined things - which blockable actions are not. However, "Actions" and "Rights" are each structured. Could we block these instead?

Solutions

My preferred solution

Solution:

  • Only allow action blocks to block $wgActions and $wgAvailableRights, and store these in a table mapping string to numerical ID (can use NameTableStore to prevent cross-wiki problems)

Notes:

  • Improves existing concepts of $wgActions and $wgAvailableRights rather than creating a new, hybrid concept of the "blockable action"
  • Some actions will be more complicated to block, e.g. 'createpage' and 'createtalk' are separate now (though this seems like a good thing, given that we're generally being asked for more control over what blocks block)

To do:

  • Figure out implementation details (table structures etc)
  • Figure out the UI for where actions and rights have the same name (e.g. always prefer right over action? always block them both (in which case no distinction needed in the UI)?)
Solutions I'm less enthusiastic about

Solution:

  • Define a new concept "blockable actions" that is stored in a normalized table

Notes:

  • It seems to impose structure on a system that's not very structured, but without making the underlying system more structured; arguably more mess and cognitive overhead

To do:

  • Figure out implementation details (probably would include a registry class, config to prevent clashes, access via NameTableStore)
  • Investigate how many actions are strings not defined anywhere else (like 'create') and how to reduce fragility of relying on these

Solution:

  • Continue "registering" on wiki

Notes:

  • Useful for allowing testing while we make these decisions, but awkward in the long term

To do:

  • Nothing
Tech debt we should probably address anyway
  • Migrate the ipblocks table fields ipb_create_account and ipb_block_email (maybe ipb_allow_usertalk?) to our solution

@Tchanders Can you file follow-up tasks from this ticket based on your recommendations? I will entirely defer to you on this as it feels like an engineering decision more than a product one. I think the two follow-ups from your investigation are:

My preferred solution

Solution:

  • Only allow action blocks to block $wgActions and $wgAvailableRights, and store these in a table mapping string to numerical ID (can use NameTableStore to prevent cross-wiki problems)

Notes:

  • Improves existing concepts of $wgActions and $wgAvailableRights rather than creating a new, hybrid concept of the "blockable action"
  • Some actions will be more complicated to block, e.g. 'createpage' and 'createtalk' are separate now (though this seems like a good thing, given that we're generally being asked for more control over what blocks block)

To do:

  • Figure out implementation details (table structures etc)
  • Figure out the UI for where actions and rights have the same name (e.g. always prefer right over action? always block them both (in which case no distinction needed in the UI)?)

and

Tech debt we should probably address anyway
  • Migrate the ipblocks table fields ipb_create_account and ipb_block_email (maybe ipb_allow_usertalk?) to our solution

Does that seem correct?