Page MenuHomePhabricator

Add timestamps of important revision events to mediawiki_history
Open, MediumPublic

Description

With the introduction of the "mw-reverted" tag in T254074, we now see more frequently that tags are applied to revisions at a time that is not when the edit was made. This type of information is not readily available in MediaWiki History. It might be possible to infer this from comparing timestamps of an edit tagged with "mw-reverted" and the reverting edit, provided we can identify what the reverting edit is.

One source of information on these tag change events is event.mediawiki_revision_tags_change in the Data Lake. I'm unsure if MediaWiki logs these changes anywhere (e.g. the logging table).

Event Timeline

@Isaac : you wanted me to tag you when I filed the task for getting information about revision tag changes into MediaWiki history. Here's said tag. I don't remember what changes you were interested in, maybe they'll fit here too?

Thanks @nettrom_WMF for creating this ticket. I think I'm going to leave it just as Morten requested (which I agree would be useful) because I was misremembering what fields were in mediawiki_history and my ask is bigger than I had thought.

For context, what I was going to request: page restrictions (wmf_raw.mediawiki_page_restrictions) are another important property associated with pages/revisions where it can be difficult but useful to identify when they were applied so I was going to ask that timestamped information about page restrictions were also included in mediawiki_history. I was thinking that page_restrictions were already tracked in mediawiki_history though just without timestamps but I now see this would be an entirely new field so I'll make a separate task if I really need this rather than expand the scope of this task.

fdans triaged this task as Medium priority.Oct 26 2020, 4:18 PM
fdans moved this task from Incoming to Datasets on the Analytics board.

It would be great for counter-vandalism tools and bots, if those tag changes could be published via the Recent Changes event stream (https://stream.wikimedia.org/v2/stream/recentchange) or another event stream.

Should I file a new request for this?

@Isaac FYI, there is a event.mediawiki_page_restrictions_change table in Hive.

@Ottomata thanks for the ping. Yeah, I'm aware of the table but the challenge has always been whether you can reconstruct the page restrictions on a page at any given moment in the past and that table unfortunately doesn't give any information about expiration of the blocks (as can be seen e.g., in the Special/Log pages). I honestly haven't looked too deeply into it so maybe there's another table that maintains that information or an event that triggers when the expirations expire but the few times I've looked into it briefly, it wasn't clear to me.

Oh interesting. Perhaps we should capture the expiry in that stream too!

Oh interesting. Perhaps we should capture the expiry in that stream too!

Yeah, if it's straightforward, that'd be appreciated! I actually had a use-case for this yesterday where an affiliate was interested in how to calculate how many users on their wiki had been blocked for at least one week in a given year. I suggested they change their criteria to # of blocks but duration of block would have been better as a metric (Cell 76 (three down from this header)).