Page MenuHomePhabricator

Create partial SQL dump of watchlist table
Open, MediumPublic

Description

Create a partial SQL dump of the watchlist table that includes:

  • wl_namespace
  • wl_title

Version: unspecified
Severity: enhancement

Details

Reference
bz49133

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 1:47 AM
bzimport set Reference to bz49133.
MZMcBride created this task.Jun 4 2013, 3:11 PM

I don't know that we can do this for privacy reasons. I'd like to have other folks weigh in on this request.

(In reply to comment #1)

I don't know that we can do this for privacy reasons. I'd like to have other
folks weigh in on this request.

Do you mean that even with just those columns one could reverse-engineer who's the corresponding user?

I mean it's not even available via the api. Maybe one can get a 'list of all watched pages' (need to check that).

(In reply to comment #3)

I mean it's not even available via the api. Maybe one can get a 'list of all
watched pages' (need to check that).

I doubt it, otherwise it would make little sense to restrict [[Special:UnwatchedPages]] to sysops. If that's your concern,

  1. it's not about privacy,
  2. I'm not sure it affects the anti-vandalism rationale as extracting that data would still be a non-trivial effort not worth the minor gain a vandal would get from it.
Nemo_bis raised the priority of this task from Low to Medium.Apr 9 2015, 7:08 AM
Nemo_bis set Security to None.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 28 2015, 11:41 AM
Nemo_bis rescinded a token.Jul 16 2016, 8:00 AM
Nemo_bis awarded a token.

Schema for the watchlist table currently: https://www.mediawiki.org/wiki/Manual:Watchlist_table

The only fields that might make sense to publish are wl_namespace and wl_title, as mentioned in the task description.

Can we make guesses about the user associated with a particular watchlist by looking at its entries and seeing who edited/created the articles? More generally, how might this data be used (or abused) if published? Adding @Reedy for the privacy/abuse angle, if you want to redirect me to someone else, feel free to do so and remove yourself.

Because I guess that someone could cause inappropriate information to be published in these dumps by watchlisting a non-existent page, we should check for existence of all titles before they get written out. https://www.mediawiki.org/wiki/Help:Watching_pages#Watching_a_nonexistent_page