Page MenuHomePhabricator

Cross-validate estimates for pageview data loss derived from February data with new March data
Closed, ResolvedPublic

Description

Name for main point of contact and contact preference
Kate Zimmerman, updates here but a nudge on Slack is appreciated

What teams or departments is this for?
Product Analytics & Fundraising Analytics

What are your goals? How will you use this data or analysis?
Cross-validate estimates for the pageview data loss so we can communicate the appropriate confidence level for the estimates

What are the details of your request? Include relevant timelines or deadlines
Check proportion of traffic by nodes to verify our estimates of the pageview data loss.
We previously calculated the estimate using February data, but now that we have March data we can check our assumption that the proportion of traffic is roughly consistent across nodes over time.
The exploration should be limited to data before March 21 (prior to the launch of the new drmrs data center, which necessarily impacts the distribution of traffic across nodes).
@JAllemandou pulled data from March and provided the location in T306480

Is this request urgent or time sensitive?
Yes, we would like to have a check on the data by April 28 as part of preparing for a presentation on metrics that matter for the May staff meeting (May 5)

Event Timeline

kzimmerman triaged this task as High priority.

We're seeing slight differences in the traffic handled by the nodes in Feb and March. On a global level they're negligible but prominent when looking at US-user-enwiki traffic.
We will need more time to conclude if our assumption that the proportion of traffic is roughly consistent across nodes over time is correct or not.
@kzimmerman I am planning to meet with @mpopov on Monday, 05/02 to review my findings.

In review with @kzimmerman
Next Steps: Meet with Data Engineering, Fundraising, Product Analytics to understand the variability and how the nodes behave over time.

@kzimmerman: Hi, the Due Date set for this open task passed a while ago.
Could you please either update or reset the Due Date (by clicking Edit Task), or set the status of this task to resolved in case this task is done? Thanks!

The original check was performed in time to inform our analysis for tuning sessions and metrics presentations. Removing the due date per Andre's message.

Maya and I discussed next steps yesterday, and she's going to do a follow up test-retest correlation check on the US data: T310016