Page MenuHomePhabricator

Update regular reporting to reflect break-out of "automated" traffic
Closed, ResolvedPublic

Description

Analytics Engineering has added a new Agent Type for pageviews data, "automated", to filter out suspected bots that do not self-identify as spiders. As of May 1, this Agent type is live on the APIs, Wikistats, Superset & Turnilo data, etc. Link to documentation: https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/BotDetection

  • Make a decision on how to report pageviews at the high-level
  • Identify & update reports impacted:
    • Tuning Session data on pageviews
    • Key Product metrics for Product Leads
    • Reading Metrics dashboard in Superset
    • Daily Pageviews dashboard in Superset
  • Inform product teams (via analysts): discuss approach on product team, share recommendation about how we plan to report on metrics at a high level

Event Timeline

My recommendations:

  • Continue to include "automated" agents in our high-level reporting for "users" (exclude only self-identified "spiders"), particularly when we report on YoY numbers, until we have a full years' worth of data.
  • Start reporting (as a subpoint) the "user"-only numbers (but do not provide YoY numbers until those are available)

Advantages:

  • YoY trends will be generally easier to identify, assuming that automated traffic is roughly consistent over time
  • We won't have to caveat YoY numbers every time we present them (because what is now "user" + "automated" was previously all clumped as "user")

Disadvantages:

  • Numbers will continue to be slightly inflated by bots (though we will have both numbers available and can speak to differences)
  • We'll have to explain differences between our reported numbers and the ones that external parties/Comms/etc might see in Wikistats if they just filter based on "Users"

Notes added to Key Product Metrics slides

On 29 April 2020, Analytics Engineering implemented new bot detection to identify previously-unidentified bots. For consistency in reporting with prior years’ metrics, we will include “automated” agents in our high-level reporting through the end of FY20-21. See Wikitech for details about bot detection."

We'll continue to report "user"-only pageviews in the notes section through the end of FY20/21.

Reading Metrics dashboard in Superset updated to include 'automated' and 'user' agents for pageviews in filter box and charts.