Page MenuHomePhabricator

Add partial blocks to mediawiki history tables
Closed, ResolvedPublic5 Estimated Story Points

Description

The Anti-Harassment Tools Team have developed the partial blocks feature, which allows for article- or namespace-specific blocks (T2674). The team is looking to understand the effectiveness of this feature, both by itself and in relation to sitewide blocks (T209403).

The MediaWiki history tables in the Data Lake (e.g. mediawiki_user_history) contains information about blocks in a format that is easy to query. It would be magnificent if these tables also expose partial blocks in ways that are easy to query.

Event Timeline

fdans triaged this task as Medium priority.Dec 17 2018, 5:29 PM
fdans moved this task from Incoming to Smart Tools for Better Data on the Analytics board.
Milimetric raised the priority of this task from Medium to High.Jan 9 2019, 5:19 PM

Super, super thanks to @nettrom_WMF for flagging this issue so we can incorporate changes to mw history

mforns lowered the priority of this task from High to Medium.Mar 7 2019, 6:11 PM

Hi @nettrom_WMF,
ipblocks_restrictions table is sqooped since this month on the cluster.
However I think that logging table doesn't contain detailed historical information on partial blocks. This will prevent us to rebuild historical (and therefore more interesting) information on partial-blocks.
Can you talk to your team see if more detailed logging could be developped?
Cheers
Joseph

Partial blocks were only enabled on Italian Wikipedia in January of this year, so the only historical data would be from the past 2 months.

However I think that logging table doesn't contain detailed historical information on partial blocks. This will prevent us to rebuild historical (and therefore more interesting) information on partial-blocks.

All of the details of the restrictions should be in there as the restrictions appear on Special:Log when a partial block is created/updated.

Hi @JAllemandou,

However I think that logging table doesn't contain detailed historical information on partial blocks. This will prevent us to rebuild historical (and therefore more interesting) information on partial-blocks.
Can you talk to your team see if more detailed logging could be developped?

As @TBolliger and @dbarratt mention, Italian Wikipedia is the place to look, it was deployed in January, and the details should be available in log_parameters. Here's a condensed version of my SQL query to get the partial blocks that have been set on itwiki:

SELECT *
FROM logging
WHERE log_timestamp >= '20190101000000'
AND log_type='block'
AND log_params LIKE '%"sitewide";b:0;%';

Hi folks - Thanks again for quick answers - My superbad I looked at the wrong. I confirm data is available.
A first toward having it available in mediawiki-history is already in CR (https://gerrit.wikimedia.org/r/c/analytics/refinery/source/+/493012). This will (should) make the logParams available in the page-history table, with all its information.
I'll keep this ticket open to further improve how we handle detailed data.

Hi @nettrom_WMF - I have a test dataset for you that include this data (example in scala-spark2:

val user_history = spark.read.parquet("/user/joal/wmf/data/wmf/mediawiki/user_history/snapshot=2019-03")
user_history..where("caused_by_event_type = 'alterblocks' and wiki_db = 'itwiki' and start_timestamp like '2019-01%' and source_log_params['sitewide'] = 'false' and source_log_params['7::restrictions'] is not null").select("source_log_params").show(100, false)

+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|[5::duration -> infinite, 6::flags -> nocreate, sitewide -> false, 7::restrictions -> Map(pages -> Map(0 -> Utente:Daimona Eaytoy))]                                                                                                         |
|[5::duration -> Sun, 19 Jan 2021 11:32:06 GMT, 6::flags -> nocreate, sitewide -> false, 7::restrictions -> Map(pages -> Map(0 -> File:Ambrosiana Inter 1929.svg, 1 -> File:Stemma Inter 1945.svg))]                                          |
|[5::duration -> 5 minutes, 6::flags -> noautoblock, sitewide -> false, 7::restrictions -> Map(pages -> Map(0 -> Camellia sinensis ))]                                                                                                        |
|[5::duration -> 5 minutes, 6::flags -> noautoblock, sitewide -> false, 7::restrictions -> Map(pages -> Map(0 -> Camellia sinensis ))]                                                                                                        |
|[5::duration -> 1 week, 6::flags -> nocreate,noautoblock, sitewide -> false, 7::restrictions -> Map(pages -> Map(0 -> Sylvester Stallone))]                                                                                                  |
|[5::duration -> 1 day, 6::flags -> nocreate, sitewide -> false, 7::restrictions -> Map(pages -> Map(0 ->  Sylvester Stallone))]                                                                                                              |
|[5::duration -> infinite, 6::flags -> nocreate, sitewide -> false, 7::restrictions -> Map(pages -> Map(0 -> Affari tuoi))]                                                                                                                   |
|[5::duration -> Sun, 21 Jul 2019 14:38:38 GMT, 6::flags -> nocreate, sitewide -> false, 7::restrictions -> Map(pages -> Map(0 -> Acerra, 1 -> Triangolo della morte Acerra-Nola-Marigliano, 2 -> Terra dei fuochi))]                         |
|[5::duration -> infinite, 6::flags -> nocreate, sitewide -> false, 7::restrictions -> Map(pages -> Map(0 -> Homo sapiens))]                                                                                                                  |
|[5::duration -> 2 days, 6::flags -> , sitewide -> false, 7::restrictions -> Map(pages -> Map(0 -> Dicastero per il servizio dello sviluppo umano integrale))]                                                                                |
|[5::duration -> 1 week, 6::flags -> , sitewide -> false, 7::restrictions -> Map(pages -> Map(0 -> Tonio Cartonio))]                                                                                                                          |
|[5::duration -> Sun, 24 Jan 2021 11:32:06 GMT, 6::flags -> nocreate, sitewide -> false, 7::restrictions -> Map(pages -> Map(0 -> File:Ambrosiana Inter 1929.svg, 1 -> File:Stemma Inter 1945.svg, 2 -> File:Stemma Inter 1908 a Colori.svg))]|
|[5::duration -> infinite, 6::flags -> nocreate, sitewide -> false, 7::restrictions -> Map(pages -> Map(0 -> Utente:Daimona Eaytoy))]                                                                                                         |
|[5::duration -> 1 day, 6::flags -> nocreate, sitewide -> false, 7::restrictions -> Map(pages -> Map(0 ->  Emiliano Sala))]                                                                                                                   |
|[5::duration -> 1 week, 6::flags -> , sitewide -> false, 7::restrictions -> Map(pages -> Map(0 -> Tonio Cartonio))]                                                                                                                          |
|[5::duration -> 1 day, 6::flags -> nocreate, sitewide -> false, 7::restrictions -> Map(pages -> Map(0 ->  Sylvester Stallone))]                                                                                                              |
|[5::duration -> 6 months, 6::flags -> , sitewide -> false, 7::restrictions -> Map(pages -> Map(0 -> Acerra, 1 -> Triangolo della morte Acerra-Nola-Marigliano))]                                                                             |
|[5::duration -> 1 day, 6::flags -> nocreate, sitewide -> false, 7::restrictions -> Map(pages -> Map(0 ->  Emiliano Sala))]                                                                                                                   |
|[5::duration -> 2 days, 6::flags -> , sitewide -> false, 7::restrictions -> Map(pages -> Map(0 -> Dicastero per il servizio dello sviluppo umano integrale))]                                                                                |
|[5::duration -> 1 year, 6::flags -> nocreate, sitewide -> false, 7::restrictions -> Map(pages -> Map(0 -> File:Ambrosiana Inter 1929.svg))]                                                                                                  |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Please let me know if you think it;s useful as a first step, and let's talk about how to make user-blocks easier to use/navigate in mediawiki-user-history :)

@JAllemandou : Thanks for your patience while this has been stuck in my backlog! I think this looks great as a first step and will enable me to answer the remaining questions around Partial Blocks that the Anti-Harassment Tools Team is interested in.

Once I've used this to answer those questions I'll have a better idea of what potential improvements can be made for working with this data. At that point, we can probably connect those with T213583 and talk about how this all fits together.

Thanks again, looking forward to start working with this data!

Nuria set the point value for this task to 5.