Page MenuHomePhabricator

[REQUEST] Pageview data follow-ups around data loss
Closed, ResolvedPublic

Description

Name for main point of contact and contact preference
Kate Zimmerman; Phab for notes on the request but Slack for questions needing a quicker response

What teams or departments is this for?
Product & Fundraising

What are your goals? How will you use this data or analysis?

What are the details of your request? Include relevant timelines or deadlines

  • update the traffic_timeline_data with estimated pageviews for mobile_vs_desktop traffic at a global level (similar to what we provided on mobile_vs_desktop_US)
  • verify en6c Fundraising days (launch day through day 7)
  • provide estimates for traffic data on en6c Fundraising days, include YoY calculations comparing to the fundraising days in 2020 and 2019
  • for each country in the enc6 group, Germany, and Japan: provide YoY calculations comparing to December 2020 and 2019 (this can be done using data already calculated in https://docs.google.com/spreadsheets/d/1J45El5yOq5EP4v_ae7qbZDhnDq-dqQ1JmcPawWKWMRk/edit)
  • segment US Data into external referrers (already done) vs. internal vs. Direct/unknown
  • provide additional segment for US mobile vs desktop divided by external vs. internal vs. direct/unknown (direct is null, empty or '-'; unknown is used where domain extraction failed)

Is this request urgent or time sensitive?
yes

Event Timeline

kzimmerman created this task.
kzimmerman renamed this task from [REQUEST] to [REQUEST] Pageview data follow-ups around data loss.Mar 16 2022, 7:08 PM
kzimmerman assigned this task to Mayakp.wiki.

update the traffic_timeline_data with estimated pageviews for mobile_vs_desktop traffic at a global level (similar to what we provided on mobile_vs_desktop_US)
work with @aminalhazwani to update the mobile vs desktop global graph in deck - https://docs.google.com/presentation/d/1EFg6oa5WNbrLVHWkQqI5vHLs4HRNMtHvBxoF2_8QULA/edit#slide=id.g119fc0c04b3_5_59

The chart has been updated : https://docs.google.com/presentation/d/1EFg6oa5WNbrLVHWkQqI5vHLs4HRNMtHvBxoF2_8QULA/edit#slide=id.g119fc0c04b3_5_59

for each country in the enc6 group, Germany, and Japan: provide YoY calculations comparing to December 2020 and 2019 (this can be done using data already calculated in https://docs.google.com/spreadsheets/d/1J45El5yOq5EP4v_ae7qbZDhnDq-dqQ1JmcPawWKWMRk/edit)

Added new tab : "enc6 + DE + JP estimates for December" to the sheet https://docs.google.com/spreadsheets/d/1J45El5yOq5EP4v_ae7qbZDhnDq-dqQ1JmcPawWKWMRk/edit#gid=1458369410

segment US Data into external referrers (already done) vs. internal vs. Direct/unknown

added Internal referrer and direct/unknown referrer data to the 'United States Pageview Multipliers by Month' tab in the sheet https://docs.google.com/spreadsheets/d/1KBeF7_Xtly2uH3uj1IkIe5tHH-M8YD5vbzZw0pbSrbw/edit#gid=0

provide additional segment for US mobile vs desktop divided by external vs. internal vs. direct/unknown (direct is null, empty or '-'; unknown is used where domain extraction failed)

Added to the US Dimensions tab : https://docs.google.com/spreadsheets/d/14iCinouU4mDjI_JyXa64RDPKl9T3GVzBGOFgaUU3G78/edit#gid=358265928

We are currently blocked on Fundraising (@EYener) for data to calculate these numbers

verify en6c Fundraising days (launch day through day 7)
provide estimates for traffic data on en6c Fundraising days, include YoY calculations comparing to the fundraising days in 2020 and 2019

Hi @Mayakp.wiki, I can add some context here. Below are the campaign dates for the English fundraiser for the last 3 years, though we also want to highlight that it's not our highest priority going into next week to provide exact breakdowns for those days. As we've seen declines in all KPIs for all days of the campaign YoY(oY), the day of year / day of month / day of week between years would be interesting to see but not of urgent need.

2019: Dec. 2, (launch / day 1) - Dec. 9 (day 8)
2020: Nov. 30 (launch / day 1) - Dec. 7 (day 8)
2021: Nov. 30 (launch / day 1) - Dec 7 (day 8)

segment US Data into external referrers (already done) vs. internal vs. Direct/unknown

also added to 'US Internal vs External vs Direct/none' tab

provide additional segment for US mobile vs desktop divided by external vs. internal vs. direct/unknown (direct is null, empty or '-'; unknown is used where domain extraction failed)

also added to 'US Desktop vs Mobile by Referrer' tab

verify en6c Fundraising days (launch day through day 7)
provide estimates for traffic data on en6c Fundraising days, include YoY calculations comparing to the fundraising days in 2020 and 2019

I have the estimates for traffic data on en6c Fundraising days, include YoY calculations comparing to the fundraising days in 2020 and 2019. Will review with @kzimmerman and share (or modify, if required)

Hi @Mayakp.wiki, I can add some context here. Below are the campaign dates for the English fundraiser for the last 3 years, though we also want to highlight that it's not our highest priority going into next week to provide exact breakdowns for those days. As we've seen declines in all KPIs for all days of the campaign YoY(oY), the day of year / day of month / day of week between years would be interesting to see but not of urgent need.

2019: Dec. 2, (launch / day 1) - Dec. 9 (day 8)
2020: Nov. 30 (launch / day 1) - Dec. 7 (day 8)
2021: Nov. 30 (launch / day 1) - Dec 7 (day 8)

Thank you so much @EYener for this info!

Thank you so much for your work, @Mayakp.wiki!

I think we've got good comparisons now for the countries and we can rely on US numbers when they're filtered to external referers, internal referers, or desktop traffic.

The challenge that has come up as we're comparing to 2019: we saw bot traffic that inflated US numbers in 2019, specifically for pageviews with no referring information ("none" or "Direct"). See T239811.

  • The bot traffic inflated pageviews by roughly 400M a month.
  • In 2019 we were averaging nearly 16B GLOBAL pageviews a month, so the bot traffic inflated global pageviews by about 2.5%.
  • But in the US, we were averaging about 3.5B pageviews a month, so the bot traffic had a much higher impact (inflating pageviews by roughly 10%).

The pageview history presentation originally covered data from 2016 through 2021. We were making comparisons between 2021 and 2016, and those comparisons were not impacted by the 2019 bot traffic.

However, now that we're trying to use 2019 as a baseline (to mirror Fundraising's comparisons), we need to appropriately factor out bot traffic where it impacts numbers (i.e. US mobile web pageviews with no referrers, and any rollup that includes that segment).

I took a VERY ROUGH pass at that in the en6C + JP + DE Pageviews Corrected for data loss spreadsheet for US none/unknown referrers and filled in preliminary numbers in the pageview history presentation.

Screen Shot 2022-03-24 at 9.09.55 PM.png (738×1 px, 165 KB)

@Mayakp.wiki For Monday 28 March, can you fill in the remaining 2021 vs 2019 comparisons that are laid out in slides 40, 41, and 42? This can be a rough estimation for Monday.

Mayakp.wiki added a subscriber: mpopov.

Updated slide 40, 41, 42 using the new R function that @mpopov developed for calculating data loss. T304876

Note: US and Global 2019 pageview data has been adjusted for removing suspicious Bot traffic.

Maya will be reviewing it with Kate before resolving it. Any follow-ups will be captured in new tasks.

Reviewed with Kate today. I'll recalculate the Uncorrected data column for Slides 40, 41, 42 (I included 2019 bot traffic in the Uncorrected data; but it shouldve been only considered for Corrected data).

Done. I made changes to slide 40, 41, 42.
Uncorrected data= Including 2019 bot traffic + not adjusting for dataloss views
Corrected data= excluding 2019 bot traffic+ adding loss views

This is done and reviewed, thank you @Mayakp.wiki !

FR is working on next steps regarding the impact of data loss on fundraising impression and fundraising countries during December. Joseph Mando is going to look into the impressions delta between our two biggest pipelines and can also look at loss estimates with the “fundraising filter” set on pageviews data.