Page MenuHomePhabricator

Spike: Determine whether it makes sense to implement IE7 correction for long-term trend charts
Open, MediumPublic

Description

This correction generally has a tiny impact on overall metrics. We're currently using it in our calculation of board metrics. Questions to address:

  • What is the extent of impact, both now and historically?
  • Does it make sense to continue this calculation in our board metrics?
  • Does it make sense to push for a wider implementation of this correction?
  • Augment this and this chart with the correction method already used to remove this spurious traffic from the recently reported monthly pageview numbers
  • publicly document the correction calculation

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 15 2017, 10:48 PM
Tbayer updated the task description. (Show Details)Nov 2 2017, 9:58 PM
Tbayer moved this task from Triage to Doing on the Product-Analytics board.Apr 26 2018, 8:23 PM

For the record, below is an example of the queries I have been using for this. This was based on the detailed analysis in https://phabricator.wikimedia.org/T157404 (for Pakistan - task set to private because the examination involved looking at some IP information), while including two other countries - Iran and Afghanistan - that showed a similarly anomalous pattern of IE7 views widely surpassing those from newer IE versions.

Note though that the recent ua-parser upgrade raised new questions about this: T193578#4238244

SELECT year, month, day, CONCAT(year,'-',LPAD(month,2,'0'),'-',LPAD(day,2,'0')) AS date,
SUM(IF(access_method = 'mobile app', view_count, null)) AS Apps,
SUM(IF(access_method = 'desktop', view_count, null)) AS Desktop,
SUM(IF(access_method = 'mobile web', view_count, null)) AS MobileWeb
FROM wmf.pageview_hourly
WHERE year > 0
AND agent_type='user'
AND NOT (country_code IN ('PK', 'IR', 'AF') -- https://phabricator.wikimedia.org/T157404#3194046
AND user_agent_map['browser_family'] = 'IE' AND user_agent_map['browser_major'] = 7)
GROUP BY year, month, day ORDER BY year, month, day LIMIT 1000;
Aklapper removed Tbayer as the assignee of this task.Apr 22 2019, 11:47 AM

Resetting task assignee as the user is not active here anymore.

kzimmerman renamed this task from Implement IE7 correction for long-term trend charts to Spike: Determine whether it makes sense to implement IE7 correction for long-term trend charts.Jun 28 2019, 6:22 PM
kzimmerman updated the task description. (Show Details)
kzimmerman triaged this task as Medium priority.
kzimmerman removed a project: Reading-analysis.
kzimmerman moved this task from Backlog to Next Up on the Product-Analytics board.

Assigning to @Mayakp.wiki to dig into the differences in the data with/without the IE7 correction. Maya to reach out to @MNeisler with questions about the queries.

Discussed with @MNeisler on the context of this issue. She will be posting the updated queries that are used for getting pageviews for board metrics.
Next steps: 1. Get access to restricted task T157404 to understand why the 3 countries have been excluded from the metrics.

  1. Compare values we get with the corrections (that were added to address bugs identified in T193578 and T157404) vs without the correction filters to see if the difference is significant or negligible.
  2. Assess whether it is worth keeping the corrections
  3. Run on Turnilo or Superset to confirm if the values are same as what we get for the board metrics.

@Mayakp.wiki
For reference, here are the two IE corrections currently applied when calculating the monthly pageviews for the board metrics.

IE7PKIRAF: January 1-May 19, 2018

SELECT year, month, day, CONCAT(year,'-',LPAD(month,2,'0'),'-',LPAD(day,2,'0')) AS date,
SUM(IF(access_method = 'mobile app', view_count, null)) AS Apps,
SUM(IF(access_method = 'desktop', view_count, null)) AS Desktop,
SUM(IF(access_method = 'mobile web', view_count, null)) AS MobileWeb,
SUM(view_count) as Total
FROM wmf.pageview_hourly
WHERE year = 2018 AND ((month < 5) OR (month = 5 AND day <=19))
AND agent_type='user'
AND NOT (country_code IN ('PK', 'IR', 'AF') -- https://phabricator.wikimedia.org/T157404#3194046
AND user_agent_map['browser_family'] = 'IE' AND user_agent_map['browser_major'] = 7)
GROUP BY year, month, day ORDER BY year, month, day LIMIT 1000

IEPKIRAF: since May 20, 2018

SELECT year, month, day, CONCAT(year,'-',LPAD(month,2,'0'),'-',LPAD(day,2,'0')) AS date,
SUM(IF(access_method = 'mobile app', view_count, null)) AS Apps,
SUM(IF(access_method = 'desktop', view_count, null)) AS Desktop,
SUM(IF(access_method = 'mobile web', view_count, null)) AS MobileWeb,
SUM(view_count) as Total
FROM wmf.pageview_hourly
WHERE ((year = 2018 AND month >= 5) OR (year >= 2019))
AND agent_type='user'
AND NOT (country_code IN ('PK', 'IR', 'AF') -- https://phabricator.wikimedia.org/T157404#3194046
AND user_agent_map['browser_family'] = 'IE') -- https://phabricator.wikimedia.org/T193578#4300284
GROUP BY year, month, day ORDER BY year, month, day LIMIT 1000;

Identified that adding these filters in the dashboards or monthly metrics is a pain point during my discussion with Connie and Megan, . Resolving this issue would greatly help everyone!