When working with data on blocked edit attempts for T366222, I found that for a fair amount of "local" blocks (block_scope = "local") the block_id column contains an array of block ids with a mix of local and global blocks. This makes working with the data more complicated because fetching additional metadata requires heuristics and multiple data sources (the local wiki for local blocks, Meta-wiki for global blocks).
More specifically, it appears that this only affects local IP blocks (block_scope = "local", block_type = "ip"). For the first week of June 2025, 99.3% of events of that type also have a comma in the block_id field, suggesting there are multiple block ids listed. None of the other types of events show this pattern.
Current state:
block_id can contain a mix of values depending on the block, ref the schema documentation
Proposed state:
- Add an identifier for the blocked edit attempt that enables multiple events to be logged if a given edit attempt is related to multiple blocks (so that it's easy to group them together as needed).
- block_id changes to contain either a keyword (for system blocks) or a single block id (either ipb_id or gb_id)
- block_scope, block_type, and block_expiry reflects the values of the associated block_id when possible (e.g. block_scope = "local" means block_id refers to ipb_id)