Page MenuHomePhabricator

Store page-links-change data in a database table and make available through a Special page
Open, Needs TriagePublic

Description

While the EventStream is valuable in its current form, and works for the primary identified use cases, it only stores data going back 30 days, making it less useful for community anti-spam efforts and research purposes.

Ideally we'd have a database table in MediaWiki storing this data, which would be searchable through a Special page, in a similar fashion to Special:LinkSearch searching the external links table.

User stories

  • As a researcher, I want to search for the history of certain link patterns so I can understand how the encyclopedia has been built over time with respect to its citations.
  • As a Wikipedia editor, I want to view a list of accounts which have added a specific link, to uncover and block spammers.

Event Timeline

Samwalton9 updated the task description. (Show Details)Thu, Nov 12, 11:10 AM

The data in the mediawiki.page-links-change stream is coming from the exact same source as the data in links and externallinks tables, so this data should already be stored in MW.

this data should already be stored in MW.

Just to clarify, is data older than 30 days still stored, or would this data storage follow the availability of data on the stream itself?

this data should already be stored in MW.

Just to clarify, is data older than 30 days still stored, or would this data storage follow the availability of data on the stream itself?

The tables represent current state of the page, so the links are in the table for as long as they are present on the page. If the link is deleted from the page, it's deleted from the table.

Samwalton9 added a comment.EditedMon, Nov 16, 4:52 PM

Right - so there's no database tracking the individual link additions and removals? That's what I'm interested in with this task - a permanent store of the data coming through the event stream.