Per some recent discussions around our presence in search results and SEO in general, there is renewed interest in better understanding and monitoring the Google Search Console (GSC) data about our sites (example - special access required). Google only makes this available for the last 90 days at any given point in time (16 months of data is slowly becoming available via the new GSC interface, although not yet available through the API), and this limit prevents us from getting a better understanding of longer-term trends in this data (and separating them from short-term changes).
This task is about setting up an automated mechanism to regularly download the data of interest via the Search Console API and store it on our own servers, in a form suitable for analysis (e.g. CSVs and/or Druid with a Turnilo/Superset visualization front-end). Parts of the data to prioritize for storing:
Store these daily numbers for each site:
- Clicks
- Impressions
- CTR
- (average) Position
Do NOT store for now:
- top keywords list
Filtered/split by:
- Country
- Device type (Desktop/Mobile/Tablet)
- Search appearance: rich results vs all results (note: GSC does not allow stats for non-rich results)
Note: back in 2015 there were related attempts by the then Discovery team (e.g. T116822, T101158), which were abandoned because it was decided that SEO was not in the team's core scope back then.