Per some recent discussions around our presence in search results and #SEO in general, there is renewed interest in better understanding and monitoring the Google Search Console (nee Webmaster ToolsGSC) data about our sites ([[https://www.google.com/webmasters/tools/search-analytics?hl=en&siteUrl=https://en.wikipedia.org#state=%5Bnull%2C%5B%5Bnull%2Cnull%2Cnull%2C90%5D%5D%2Cnull%2C%5B%5Bnull%2C6%2C%5B%22WEB%22%5D%5D%5D%2Cnull%2C%5B1%2C2%2C3%2C4%5D%2C1%2C0%2Cnull%2C%5B2%5D%5D |example]] - access required).special access required). Google only makes this available for the last 90 days at any given point in time (16 months of data is slowly becoming available via the new GSC interface, Google only makes thisalthough not yet available for the last 90 days at any given point in timethrough the API), and this limit prevents us from getting a better understanding of longer-term trends in this data (and separating them from short-term changes).
This task is about setting up an automated mechanism to regularly download the data of interest via the Search Console API and store it on our own servers, in a form suitable for analysis, e (e.g. as a MySQL table or simply as CSV files. It might be easiest to adapt an existing tool (e.g. https://searchwilderness.com/gwmt-data-python/#searchanalytics ).
Here is a first stab at specifying what partCSVs and/or [[ https://wikitech.wikimedia.org/wiki/Analytics/Systems/Druid | Druid ]] with a [[ https://wikitech.wikimedia.org/wiki/Analytics/Systems/Turnilo | Turnilo ]]/[[ https://wikitech.wikimedia.org/wiki/Analytics/Systems/Superset | Superset ]] visualization front-end). Parts of the data to prioritize for storing:
Store these daily numbers for each site:
- Clicks
- Impressions
- CTR
- (average) Position
Don'tDo NOT store //for now://:
- top keywords list
Filtered by:
- Site/split by:
- Country?
- Device type (Desktop/Mobile/Tablet)
- Search appearance (: rich results vs all results (**note**: GSC does not allow stats for notn-rich results)
I think we may want to make the resulting datasets available publicly so that community members can monitor it for their wikis, but that's up for discussion.
**Note: It looks like**: back in 2015 there were already related attempts by the then Discovery team (e.g. T116822, T101158, also going further towards making the data in form of public dashboardsT101158), which were however abandoned because it was decided that SEO was not in the team's core scope back then.