Page MenuHomePhabricator

Report Total pageviews (new definition) in 2014 Oct-Dec
Closed, ResolvedPublic

Description

Generate number for this report: https://meta.wikimedia.org/w/index.php?title=File%3AWikimedia_Foundation_Quarterly_Report%2C_2014-15_Q2.pdf&page=4
Exclude crawlers

  • Historical data is available but unreliable (doesn't exclude crawler traffic)
  • New data available via Pentaho
  • Recommendation: use data in Pentaho as outlined by Oliver (trivial)

Event Timeline

Tbayer raised the priority of this task from to Needs Triage.
Tbayer updated the task description. (Show Details)
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 6 2015, 9:24 PM
Tbayer set Security to None.
kevinator triaged this task as High priority.Feb 6 2015, 9:31 PM

I will assign this task to someone on the team following my triaging meeting with Dario on 2015-02-09.

It should be trivial; SELECT LEFT(timestamp,7) AS yearmonth,
SUM(pageviews) FROM staging.pageviews04 WHERE is_automata=0 AND
is_spider = 0 AND left(TIMESTAMP,7) IN('2014-10','2014-11','2014-12');

(I, however, leave for my first proper vacation in years...now. So,
someone else can do it. It's 30 seconds work to anyone with
analytics-store access is my point.

Note: this data is not citable. The data is available in Pentaho, but it is not a public or reliable server to point people to.

@Tbayer: the new definition will show discrepancies with the old data. We should stick to the legacy data until we have stable citable source for the new data. This applies not just to pageviews or any other data we are producing for the quarterly report.

Thanks Kevin - I'm of course aware that the new definition yields different numbers than the old definition, that was the reason in the first place why @Eloquence strongly suggested we should already highlight the new one in this scorecard.

And just to understand the issue here: Do you mean that there is concern that the new def pageview numbers are not considered reliable enough yet? (AIUI they are already exposed in some public per-project dashboards and have been used to produce reports to the Board, the FR team and the December metrics presentation.) If it's just about the ability to add a link to an existing dashboard or table with further information - as we used to do in the "Data and Trends" section of the monthly report - , that's of secondary importance to the report's format (if there is no monthly total pageviews dashboard available yet, we can instead simply include a link to the pageview definition to clarify provenance).

kevinator renamed this task from Total pageviews (new definition) for Oct-Dec 2014 to Report Total pageviews (new definition) in 2014 Oct-Dec.Feb 11 2015, 7:43 PM
kevinator updated the task description. (Show Details)
Tbayer closed this task as Resolved.Feb 11 2015, 9:10 PM

done using Pentaho (thanks to Dario's instructions)

FYI, here's what I found:
Total pageviews (no spiders, no automata) for Q2: 16.7B/month
Change from Q1: +6.1%
Change y-o-y: +0.2%

kevinator moved this task from Next Up to Done on the Analytics-Kanban board.Feb 11 2015, 9:50 PM