Expose new ipblocks_restrictions table to Wiki Replica users
Open, Needs TriagePublic

Description

With the introduction of partial blocks, ipblocks_restrictions table was added to the schema. This table does not contain any private data and it should be exposed to the replicas.

dmaza created this task.Mon, Nov 19, 11:06 AM
Restricted Application added subscribers: MGChecker, Aklapper. · View Herald TranscriptMon, Nov 19, 11:06 AM

@Marostegui what about the new column ipblocks.ipb_sitewide? what do we need to do to add that to the current view for ipblocks?

Check my answer at T209549#4757473 - same applies to it.

bd808 added a subscriber: bd808.

This table does not contain any private data and it should be exposed to the replicas.

@dmaza, can you be the person in charge of getting someone from the Security team to validate this claim on this ticket?

bd808 renamed this task from Get ipblocks_restrictions table view on labs to Expose new ipblocks_restrictions table to Wiki Replica users.
dmaza added a comment.Mon, Nov 19, 5:05 PM

This table does not contain any private data and it should be exposed to the replicas.

@dmaza, can you be the person in charge of getting someone from the Security team to validate this claim on this ticket?

Sure

@TBolliger, I don't know enough about this table to know whether it's good for it to be available on the cloud replicas. If it's ok, then once this task is done it would be relatively easy for us to sqoop the table and make it available in Hadoop as well.

However, if this table should NOT be publicly available in the cloud replicas, then we can sqoop it from our production replicas separately. Then it would be available only in Hadoop. We might not have time to do this before the end of this quarter, but we'll get it done afterwards. I'll tag T209549 as well to keep these discussions in sync.

@TBolliger, I don't know enough about this table to know whether it's good for it to be available on the cloud replicas. If it's ok, then once this task is done it would be relatively easy for us to sqoop the table and make it available in Hadoop as well.

However, if this table should NOT be publicly available in the cloud replicas, then we can sqoop it from our production replicas separately. Then it would be available only in Hadoop. We might not have time to do this before the end of this quarter, but we'll get it done afterwards. I'll tag T209549 as well to keep these discussions in sync.

Thank you for working with us on this, and thank you @dmaza for helping get this reviewed by Security. We're extremely confident that there is nothing private in this new column and table, as it is all logged publicly and displayed on Special:BlockList.

I thought it'd be useful to clarify a bit. Importantly, the point I'm making here about filtered-tables has no relation to the different discussion about sanitizing data in Hadoop, happening in T209031.

Relevant to this discussion, we prefer to give cloud replicas a chance first. Because this way everyone can have access to the same data, whether they work with the WMF directly or not. If that doesn't work, we'll import the data directly into Hadoop.

Reedy added a subscriber: Reedy.Tue, Nov 20, 6:46 AM

This table does not contain any private data and it should be exposed to the replicas.

@dmaza, can you be the person in charge of getting someone from the Security team to validate this claim on this ticket?

Confirmed. No private data or similar. One is a link to ipblocks, the other two are just simple numbers

--
-- Partial Block Restrictions
--
CREATE TABLE /*_*/ipblocks_restrictions (

  -- The ipb_id from ipblocks
  ir_ipb_id int NOT NULL,

  -- The restriction type id.
  ir_type tinyint(1) NOT NULL,

  -- The restriction id that corrposponds to the type. Typically a Page ID or a
  -- Namespace ID.
  ir_value int NOT NULL,

  PRIMARY KEY (ir_ipb_id, ir_type, ir_value)
) /*$wgDBTableOptions*/;

-- Index to query restrictions by the page or namespace.
CREATE INDEX /*i*/ir_type_value ON /*_*/ipblocks_restrictions (ir_type, ir_value);

To confirm, ipd_deleted blocks will never have restrictions?

Correct. They should be scrubbed when a user who is partially blocked (and therefore listed on ipblocks_restrictions) has a block escalated to sitewide and suppressed/oversighted/hidden.

Related task for the Anti-Harassment Tools team to fix soon: T210002

Great. Once this is available in the cloud replicas, give me a ping and I'll update the Hadoop import to include it.

bd808 moved this task from Backlog to Wiki replicas on the Data-Services board.Thu, Nov 29, 12:51 AM