Page MenuHomePhabricator

[Investigation] What is required to ensure metadata from rev_deleted revisions is not kept in permalink table(s)
Closed, ResolvedPublic

Description

This task represents the work with investigating and documenting the work that would be required to ensure that metadata stored in the revisions with rev_deleted is NOT kept in the table(s) we're introducing as part of T296801 and T303295.

Open Question(s)

  • 1. What work would need to be done to ensure that metadata stored in rev_deleted revisions is NOT kept in the table(s) we're introducing as part of T296801 and T303295?
  • 2. Is the Editing Team positioned to do the work defined in "1."?
  • 3. If the answer to "2." is "No":
    • Which team(s) would be better positioned to do this work?
    • What work needs to be done to ensure the table(s) we're introducing as part of T296801 and T303295 are sufficiently protected? Which team(s) are best positioned to do this work? E.g. Editing? Data Persistence?

Done

  • Answers to all === Open Questions are documented

Event Timeline

matmarex renamed this task from [Investigation] What is required to ensure metadata from rev_deleted_revisions is not kept in permalink table(s) to [Investigation] What is required to ensure metadata from rev_deleted revisions is not kept in permalink table(s).Apr 17 2023, 11:33 PM
matmarex updated the task description. (Show Details)
matmarex claimed this task.
matmarex subscribed.

Instead of trying to prevent the metadata from being stored, which would be tricky as the rev_deleted field can be changed (which would necessitate creating or deleting the metadata every time), we've decided to simply join against the revision table and check the value of the rev_deleted field when fetching the data on Special:GoToComment etc., and ensuring that we don't reveal anything about the revisions that the user is not allowed to view.

And what about public database replicas? Won’t those expose contents of deleted revisions?

And what about public database replicas? Won’t those expose contents of deleted revisions?

The permalink tables are not exposed in the public replicas at the moment for this reason.

If someone really wanted them to be public, it could be implemented with a check against revision.rev_deleted in the view definition. There are already some tables that work this way – search https://gerrit.wikimedia.org/g/operations/puppet/+/19cae4ac7e333bc31c1ce5435cbab7582e306f80/modules/profile/templates/wmcs/db/wikireplicas/maintain-views.yaml for "rev_deleted" to see some examples.

This was previously noted in:

  • Should this table be replicated to wiki replicas (does it not contain private data)?
    • Yes, but since the data in the tables is generated from the contents of page revisions, it would need to be filtered in case the revisions or pages are deleted. I think this is possible to handle in our replication setup using views, and I could probably create the necessary patches.