Page MenuHomePhabricator

Requesting Google Search Console Access for a Service Account
Closed, ResolvedPublic

Description

Why We're trying to dig deeper into some trends to try and understand better why PageView trends are behaving in certain observed ways. Google Search Console has data that might hold clues to this.

How? I can get single dumps of an entire year (up to 16 months) and analyze it in a spreadsheet. But what would be more useful would be to get daily dumps of the nature of "which queries led to Wikipedia results beings shown on Google Search", "what was the average result position for a wikipedia result to each of those queries", and "how many instances of that query on that day at a given position had a click on it", and so on. These queries are complex and will need a bit more code than a spreadsheet will allow us to do.

Privacy Concerns? The data is already heavily aggregated by Google. The data is not linked with Wikipedia data in any way; it is purely derived from user behaviour on Google's properties.

Details

  • I've created a service account called wmf-search-console-account@wmf-sc-experiments.iam.gserviceaccount.com.
  • This will need to be added as a "Restricted Viewer" to Search Console as per the definition here. I'd like this service account to be the "Restricted Viewer" for all the domains that my own principal (scherukuwada@wikimedia.org) is an Unverified Owner for. At the very least, at least en.m and en wikipedias.
  • I intend to use the Google Console Project named "wmf-sc-experiments" created specifically for this purpose.

Requested manager (@dr0ptp4kt) approval.

Event Timeline

Additional note: if we determine that this data is useful enough for us to import into our own data stores, the same service account can be used just as well to run periodic imports of the data. I'm asking around in the Analytics channel and will follow up as required.

@SCherukuwada this sounds reasonable, but since this is my first time approving Google Search Console access I am going to discuss with @Volans before approving, also as you mentioned I will need the approval of @dr0ptp4kt as well.

@dr0ptp4kt is aware of the request; he's just somewhat swamped for now.
FWIW, I'm not in any sort of hurry. :-)

Approved, conditioned on the data remaining inaccessible except for those with NDA, need to know, and suitable strong authentication requirements where identities are under SOPs for access revocation at cessation of work arrangement (less wordy: Google Workspace 2FA for the org). Although the top 1,000 search query data are not surprising and have a high degree of overlap with actual site traffic, the resolution of the data can be more fine grained than what we typically aggregate on a geo basis, especially once actual country level filtering is applied or if clever inferences are drawn based on other dimensions and the nature of queries, hence this explicit conditioning.

If we'll want a general purpose tool to be community facing or to ingested into the data lake (which would be neat!), all the usual stuff for project planning, support, privacy and security review, etc. should be done.

@SCherukuwada I have added wmf-search-console-account@wmf-sc-experiments.iam.gserviceaccount.com to the following:

If you need me to add the account to others, an explicit list of sites would be helpful.

Thank you Jesse. This will be sufficient for now; I'll come back if I need more sites.

I'm documenting some more information around how I'll likely end up using this access just in case someone happens to read it.

Billing Google Search Console APIs are free and have no impact on billing, even if they do come with limits. This is documented here.

Compute The amount of compute this would require would be negligible (probably won't even keep a CPU 25% busy for maybe half an hour) even if we run an import job for every single day's data covering the maximum 16-month window at the highest level of granularity. I intend on running this on Mediawiki VPS at first and then deciding if someone wants me to move this to a more permanent home.