Page MenuHomePhabricator

Update PageviewDefinition to only include /api/rest_v1/page/mobile-html requests with X-Analytics: pageview=1 in pageviews
Closed, ResolvedPublic

Description

The apps have been updated to send pageview=1 in the X-Analytics header with requests that are pageviews. To exclude requests that are made for offline viewing, the definition should be updated to only count requests with pageview=1 in the X-Analytics header.

iOS App Version with this change:
6.6.2 (released 7/13/2020)
Android App Version with this change:
2.7.50324 (with the header on all mobile-html requests)
TBD (with the header only on pageviews)

Event Timeline

Milimetric moved this task from Incoming to Data Quality on the Analytics board.
Milimetric subscribed.

We think it will be very error prone to try and parse out the app version and split up the code to handle different versions in different ways. We think it might be better to only consider pageview=1 requests as pageviews going forward after some point. We should talk and figure out if this would work for you and what point we want to make the switch. As well as discuss alternatives.

Also, who on your team can we work with to code review these changes and test the implications? We usually make the changes and then run them on a few hours of data to see what they mean.

Sounds good - @Charlotte, @JMinor, and @SNowick_WMF can help determine when we should switch to only counting pageview=1 requests. It might be feasible to make this switch fairly soon instead of trying to separate out by version. For code review, our tech leads @Tsevener (iOS) and @Dbrant (Android) can help

Note: @Charlotte, @JMinor, and @SNowick_WMF agree with switching to only counting pageview=1 requests. Heads up @Tsevener (iOS) and @Dbrant (Android).

So we are all on the same page this will *reduce* the number of pageviews, see plot. For 2020/07/24 data for IOS the reduction is about 8% on data marked now as 'user'

Screen Shot 2020-07-24 at 5.00.02 PM.png (388×822 px, 47 KB)

So we are all on the same page this will *reduce* the number of pageviews, see plot. For 2020/07/24 data for IOS the reduction is about 8% on data marked now as 'user'

Confirmed, that's expected given that we're now excluding requests to save for offline viewing and requests from versions older than 6.6.2.

Change 616591 had a related patch set uploaded (by Nuria; owner: Nuria):
[analytics/refinery/source@master] Removing outdated IOS pageview code

https://gerrit.wikimedia.org/r/616591

Change 616629 had a related patch set uploaded (by Nuria; owner: Nuria):
[analytics/refinery/source@master] For Android and iOS we only count pageviews with x-Analytics marker

https://gerrit.wikimedia.org/r/616629

Current patch counts only requests with X-Analytics: pageview=1 for iOS and Android, logic should be unchnaged for KAiOS

Change 616591 merged by jenkins-bot:
[analytics/refinery/source@master] Remove outdated IOS pageview code

https://gerrit.wikimedia.org/r/616591

Run 1 hour with new code:

OLD code:

_c0	            access_method              year	    month	day	hour
12789534	desktop	                    2020	7	27	1
13547212	mobile web	            2020	7	27	1
372909	        mobile app	            2020	7	27	1

As you can see pageviews for mobile app are 89% of what they were before

NEW CODE:

_c0	               access_method	year	   month	day	hour
12789534	desktop	               2020	7	27	1
13547212	mobile web	       2020	7	27	1
333627	        mobile app	       2020	7	27	1

Change 616629 merged by jenkins-bot:
[analytics/refinery/source@master] For Android and iOS we only count pageviews with x-Analytics marker

https://gerrit.wikimedia.org/r/616629

Code is deployed and in effect, there are every hour about 5000 pageviews for mobile app that get marked as such despite not having the pageview marker on X-analytics, will investigate a bit more cause that suggests a bug in the pageview definition. This 5000 amount to 1.5% of total pageviews every hour so a small amount but still it needs some investigation.

Change 618157 had a related patch set uploaded (by Nuria; owner: Nuria):
[analytics/refinery/source@master] Test case for pageviews marked as such that should not be so

https://gerrit.wikimedia.org/r/618157

Change 618635 had a related patch set uploaded (by Nuria; owner: Nuria):
[analytics/refinery/source@master] Requests with app user agents should not be evaluated as app pageviews

https://gerrit.wikimedia.org/r/618635

Change 618157 abandoned by Nuria:
[analytics/refinery/source@master] Test case for pageviews marked as such that should not be so

Reason:

https://gerrit.wikimedia.org/r/618157

Pageviews as of now for 1 hr

select count(*), access_method, year, month, day, hour from wmf.webrequest where is_pageview and year =2020 and day=6 and month=8 and hour=2 group by access_method, year, month, day, hour;

_c0 access_method year month day hour
12300461 desktop 2020 8 6 2
293564 mobile app 2020 8 6 2
12440620 mobile web 2020 8 6 2

With correction on change https://gerrit.wikimedia.org/r/618635 pageviews for mobile app are about 1% lower

OK
_c0 access_method year month day hour
12300461 desktop 2020 8 6 2
12440620 mobile web 2020 8 6 2
290383 mobile app 2020 8 6 2

Change 620293 had a related patch set uploaded (by Fdans; owner: Fdans):
[operations/puppet@production] modules/refine: bump jar version to fix pageview definition bug

https://gerrit.wikimedia.org/r/620293

Change 620293 merged by Elukey:
[operations/puppet@production] profile::analytics::refinery::job::refine: bump jar version

https://gerrit.wikimedia.org/r/620293