Page MenuHomePhabricator

Create method for accessing user watchlists in database queries
Closed, DeclinedPublic

Description

Users used my tool to check their 20,000+ article watchlists for disambiguation pages. With the removal of temporary tables this is no longer possible.

As a temporary fix for only English Wikipedia, I've switch to DPL bot's hourly updated results table.

Event Timeline

Dispenser created this task.
Dispenser lowered the priority of this task from High to Medium.Dec 15 2017, 6:04 AM

"accessing user watchlists in database queries" could you elaborate what do you want to query? watchlist is a private table, which is not offered to be queried- do you mean watchlist_count, meaning a duplicate of T59617 or something else?

I mean literally just that. You can try out the aforementioned gimped tool which only joins against dplbot 's pre-generated results table for enwiki, The tool calls watchlistraw OAuth or Watchlist Token to create a TEMPORARY TABLE with all the pages that you currently watch.

@bd808 Isn't dumping the *private watchlist* of a user into a public database something we would frown upon? Wouldn't that need proper informed consent? Couldn't that be considered a privacy violation?

I am not too worried because temporary tables are not shared between users- but I wonder if this no longer being possible is actually a feature and not a bug. My first instinct is that we could create public information tables in some way on wikireplicas; but we should ban private data there for the risk of leaks.

Note I know how oath works, it just looks like a bad idea to create tables on wikireplicas (temporary or not, before or after), without TLS support with the risk of being leaked to all users.

I have to agree with @jcrespo that the new state seems more like a feature than a bug. The use case for @Dispenser's tool is certainly real, but an implementation that requires collecting private data via an authenticated API request and copying it into Toolforge or another shared system where it could potentially be leaked is not something we should be promoting or enabling.

I wonder if there is a less efficient but possibly more secure implementation that could be done by keeping the watchlist contents on the client (fetched with a AJAX request to the Action API in an authenticated session) and then making batched requests to some other API endpoint which exposes the DPL data?

I am going to decline this based on our own feedback, but that doesn't mean we cannot continue discussing other methods of doing what you need. However, the initial response to the request is: "do not write private data to databases, and less to wikirreplicas". My advice would be to copy data that is public to toolsdb and do there the necessary joins, but if possible, always keeping user data on application memory, so it doesn't accidentally leak.