The webrequest x_analytics field now provides the ismobile field to remove the .m url part in the near future.
This task is about analyzing and discussing differences in using both methods to derive the access_method for webrequest data.
Description
Description
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Resolved | • Krinkle | T214998 RFC: Serve mobile and desktop variants through the same URL (unified mobile routing) | |||
| Resolved | mforns | T389696 Analyze impact for webrequest and unique devices pipelines to derive access_method without m-dot domain | |||
| Resolved | JAllemandou | T401576 Analyze data differences between `access_method` derived from URL and from x-analytics |
Event Timeline
Comment Actions
First analysis and results:
select webrequest_source, access_method, x_analytics_map['ismobile'], is_pageview, is_redirect_to_pageview, count(1) from wmf.webrequest where year= 2025 and month = 8 and day = 6 group by webrequest_source, access_method, x_analytics_map['ismobile'], is_pageview, is_redirect_to_pageview order by webrequest_source, access_method, x_analytics_map['ismobile'], is_pageview, is_redirect_to_pageview ;
Results with percentage of grand total and comment additions:
| webrequest_source | access_method | x_analytics_map[ismobile] | is_pageview | is_redirect_to_pageview | count(1) | manually_added_percentage | comment |
| text | desktop | NULL | FALSE | FALSE | 4020553802 | 34.25% | |
| text | desktop | NULL | FALSE | TRUE | 394881856 | 3.36% | |
| text | desktop | NULL | TRUE | FALSE | 413184857 | 3.52% | |
| text | desktop | 1 | FALSE | FALSE | 1 | 0.00% | capital M in .m URL not normalized in webrequest - correctness improvement :). |
| text | desktop | 1 | TRUE | FALSE | 6 | 0.00% | capital M in .m URL not normalized in webrequest - correctness improvement :). |
| text | mobile app | NULL | FALSE | FALSE | 396594427 | 3.38% | |
| text | mobile app | NULL | TRUE | FALSE | 9279961 | 0.08% | |
| text | mobile app | 1 | FALSE | FALSE | 1796122 | 0.02% | Not important, webrequest categorisation won't change. Interesting to understand though. |
| text | mobile app | 1 | TRUE | FALSE | 43671 | 0.00% | Not important, webrequest categorisation won't change. Interesting to understand though. |
| text | mobile web | NULL | FALSE | FALSE | 26699202 | 0.23% | Rows missing the ismobile for exact match. Not pageviews nor redirect_to_oageview so no impact on unique_devices nor pageview metrics. |
| text | mobile web | NULL | FALSE | TRUE | 1338976 | 0.01% | Rows missing the ismobile for exact match. redirect_to_pageview used only in unique_devices_per_project_family which doesn't split by access_method. No impact on unique_devices nor pageview metrics. |
| text | mobile web | 1 | FALSE | FALSE | 2524182280 | 21.50% | |
| text | mobile web | 1 | FALSE | TRUE | 120918652 | 1.03% | |
| text | mobile web | 1 | TRUE | FALSE | 379504262 | 3.23% | Perfect match - awesome :) |
| upload | desktop | NULL | FALSE | FALSE | 3325355452 | 28.33% | |
| upload | desktop | 1 | FALSE | FALSE | 673114 | 0.01% | I guess those rows have more precise qualification :) |
| upload | mobile app | NULL | FALSE | FALSE | 123668514 | 1.05% | |
| upload | mobile web | NULL | FALSE | FALSE | 399 | 0.00% | Only non-wiki hosts with .m qualifier. |
The data looks good enough IMO to change the underlying access_method algorithm and update the unique_devices_per_domain job.
I'll start doing that.