Page MenuHomePhabricator

Add support for private wikis to pageviews API
Closed, DeclinedPublic

Description

Problem
There isn't an easy way to get the page/siteviews of private wikis (like Office wiki). Tool-Pageviews provides an awesome way to do this with all of our public wikis, but not our private ones.

Solution
Allow private wikis to be added to the pageviews whitelist, but ignored in AQS. From this, we might can create a mechanism to use the data with a tool that requires you login (with OAuth) to the private wiki. Or, data could be visualized as needed using SWAP.

Event Timeline

Are we okay with the idea of stuff running in labs accessing data from private wikis via OAuth?

Are we okay with the idea of stuff running in labs accessing data from private wikis via OAuth?

I'm curious why we wouldn't be?

MusikAnimal added subscribers: Nuria, MusikAnimal.

I was thinking this wouldn't be for Tool-Pageviews, not just because it's private data, but also because that tool is hard-wired to use the public APIs, as opposed to fetching data through some other connection.

At any rate, first thing's first, which is to see if we can get this data into the pageviews pipeline. I believe this is as simple as adding office.wikimedia to the pageviews whitelist. However AQS needs to know to ignore this particular project. I don't know what kind of work would be involved there. @Nuria Do you know if this is possible?

My thought was that if we get this data into Hive, we can visualize it using SWAP. Interested parties would have to consult users who have production data access to get the charts, though. In short, I don't see this ever getting into Tool-Pageviews (unless the data is public).

MusikAnimal renamed this task from Add support for private wikis to Pageviews to Add support for private wikis to pageviews API.Jan 14 2020, 9:40 PM
MusikAnimal updated the task description. (Show Details)

If we feel a wikis pageview data is of broad interest that wiki should not probably be marked as private. The idea of marking it as such is restrict data harvesting because wiki is restricted (techconduct) or of temporary interest (wikimanian2016) of a test instance (test wiki). I think we will probably decline this request.

Are we comfortable with every page title on private wikis potentially being made public, such that allowing any user to publicly find and/or enumerate them (as Tool-Pageviews allows) wouldn't result in any Vuln-Infoleak s? I'm thinking there might be at least a few pages on officewiki where this might not be the case.

Are we comfortable with every page title on private wikis potentially being made public

No, we are not. Declining ticket on our end.