Both skin preference and global preference reflect the status as of the data collection date, which is April 15, 2024.
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Thu, Apr 18
Wed, Apr 17
Tue, Apr 16
Mon, Apr 15
Fri, Apr 12
@JAllemandou @BTullis, Thank you very much for detailed explanation! I will move from hive to presto and spark. I am going to mark this ticket as resolved.
Fri, Apr 5
@ovasileva, here are the analysis result. The answer to the third questions is a very rough estimate. Let me know if you disagree with any of the assumptions.
Wed, Apr 3
Tue, Apr 2
What is the default font value on vector-2022 ? Regular
Fri, Mar 29
Here is the font size stats on desktop web by skin version. A few questions based on the data
- What is the default font value on vector-2022 ?
- What do the values 0, 1, 2 , and disabled stand for on vector-2022?
- What is the default font value on vector ?
Tue, Mar 26
Mon, Mar 25
As a followup, I have documented sample rate at data hub.
As a followup, the sample rate is document at datahub
As a followup, I have documented the current sample rate at https://datahub.wikimedia.org/dataset/urn:li:dataset:(urn:li:dataPlatform:hive,event.mobilewebuiactionstracking,PROD)/Documentation?is_lineage_mode=false
Mar 21 2024
- Following the inclusion of client hints in the analysis, there was an average increase of 2 in the maximum number of unique user agents on a daily basis.
- Throughout January 2024, the daily maximum rose from 6 to 8 unique user agents per IP on English Wikipedia.
- For some days, the increase in maximum after including client info could be as large as 5.
Mar 19 2024
Mar 13 2024
What has been checked | Status | Note | Snapshot of the result from the old schema | Snapshot of the result from the new schema |
---|---|---|---|---|
Pick one session_id, compare the result | PASS | Captured same number of events. | ||
Pick one pageview_id, compare the result | PASS | Captured same number of events. | ||
By date | PASS | The new schema captured 0.37% more events than the old schema. The new schema captured 0.34% more sessions than the old schema. | ||
By action | PASS | Between March 1st and 10th, The new schema captured 0.95% more click events than old schema. The new schema captured 0.36% more init events than old schema. The new schema captured 2.23% more show events than old schema. They are within a 2.5% acceptable variance. | ||
By event name | ❌ | Based on the data collected from 2024-03-01 to 2024-03-10: 1) 176 types of events are captured in new schema or old schema. 2) 31 types of events are captured in new schema, but not in old schema. 3) 2 types of events are captured in old schema, but not in new schema. They are menu.preferences and menu.ve-edit | event name diff file | |
By wiki | ❓Is a difference of 2.6% on commonswiki OK? | Between 2024-03-01 and 2024-03-10: New schema captured 819 wikis, while the old schema captured 820. The missed wiki is nycwikimedia. The new schema captured 0.56% more events than the old schema in average. The new schema captured 0.47% more sessions than the old schema in average. The highest different rate of session count is from small wikis.Among the large wikis, the events on commonswiki is 2.6% more in new schema. | ||
By skin name | ❓is it expected that the new schema captured 'vector' and 'vector-2022' skin with agent.client_platform_family='mobile_browser'. | Based on the data collected from 2024-03-01 to 2024-03-10: The new schema captured 0.37% more minerva events than old schema. The new schema captured 0.34% more minerva sessions than old schema. 'vector' and 'vector-2022' skins are not captured in old schema, but captured in new schema with agent.client_platform_family='mobile_browser'. To check with engineer whehter it is expected. | ||
By user type | PASS | The difference is within a 2.5% variance. | ||
agent type | PASS | |||
edit count bucket | ❓ Is it expected that performer.edit_count_bucket is NULL in new schema for logged out users, while in old schema, event.editCountBucket is '0 edits'. | For loggedin users, editcountbucket difference is within 2.5% variance. For loggedout users, in new schema performer.edit_count_bucket is NULL, while in old schema event.editCountBucket is '0 edits'. Need to confirm whether it is expected. | ||
pageNamespace | ❌ | page.namespace_id is NULL in new schema | ||
is_dark_mode_on, | ❓ Is the null in old schema expected? | The difference is within a 2.5% variance. The old schema captured some NULLs, while new schema didnot.For the events with null in event.is_dark_mode_on, their kin is also NULL. To check with engineer | ||
is_dark_mode_prepared_by_os | ❓ Is the null in old schema expected? | The different is within a 2.5% variance. The old schema captured some NULLs, their skin field is NULL too. To check with engineer | ||
dark_mode_setting | ❓ Is the null in old and new schemas expected? | The differences in dark_mode_setting being 0,1, 2, and NULL are within a 2.5% variance. | ||
is_full_width | ❓ | The difference is within a 2.5% variance. - The old schema captured some NULLs, their skin fields are NULL too . To check with engineer | ||
is_media_viewer_enabled | ❓ | For is_media_viewer_enabled=true, the difference is within a 2.5% variance. For is_media_viewer_enabled=false, the new schema captured 2.55% more events than the old schema. To check with engineer. | ||
is_page_preview_on | PASS | The difference is within a 2.5% variance | ||
is_pinned | PASS | The difference is within a 2.5% variance | ||
font | ❓Is font size 0 expected? | The differences in font sizes, being small, regular, and large, are within a 2.5% variance. The difference in font size being large exceeds 2.5%. Given the low volume and small absolute difference, we mark it as PASS. Both schemas captured some events where the font size was 0. To check with the engineer. | ||
action_context | ❓ What's the meaning of the field value | need to document the meanings of the values: stable, stable,amc | ||
sample.rate | ❌ incorrect | 100% for all wikis and for all type of users | ||
is_bot | ❌ | performer.is_bot is NULL in new schema. |
Based on the number of events captured in the old and new schema, we believe the new schema is configured with the same sample rate as the old schema, as mentioned in T353029#9621127. However, it is recorded as 100% for all wikis in the new schema.
Mar 12 2024
Mar 11 2024
@KSarabia-WMF , thanks for the info.
What has been checked | Status | Note | Snapshot of the result from the old schema | Snapshot of the result from the new schema |
---|---|---|---|---|
Pick one session_id, compare the result | PASS | Captured same number of events. | ||
Pick one pageview_id, compare the result | PASS | Captured same number of events. | ||
By date | PASS | The new schema captured 0.39% more events than the old schema. The new schema captured 0.34% more sessions than the old schema. | ||
By action | PASS | Between March 1st and Match 5th, the new schema captured 0.18% more click events than old schema. The new schema captured 0.18% more click sessions than old schema. The new schema captured 0.58% more init events than old schema.The new schema captured 0.49% more init sessions than old schema. They are within 2.5% acceptable variance. | ||
By event name | ❌ | 4000+ types of event names in desktopwebuiactionstracking schema schema. Event names contain content info of the pages. . Some event names are in old schema but not in new schema, for example ui.sidebar-toc. Some event names are not in old schema but in new schema, for example, ns=0, most of them are from minerva skin | even_name.diff_comparison | |
By wiki | PASS | New schema captured 828 wikis, same as the old schema, in the month of Feb 2024.The highest different rate of session count is from small wikis. The events on nowiktionary is 42.3% fewer in new schema. The difference is reduced to 10% since 2024-02-26. The new schema captured 0.85% more events than the old schema in average.The new schema captured 1.44% more sessions than the old schema in average. | ||
By skin name | ❓ is it expected that the new schema captured 'minerva' skin with agent.client_platform_family='desktop_browser'. | Based on the data collected from 20240301 to 20240305. The new schema captured 0.52% more vector events than old schema.The new schema captured 0.50% more vector sessions than old schema. The new schema captured 0.3% more vector2022 events than old schema.The new schema captured 0.25% more vector2022 sessions than old schema. minerva skin is not captured in old schema, but captured in new schema with agent.client_platform_family='desktop_browser'. To check with engineer whehter it is expected. | ||
By user type | PASS | Based on the data collected from 20240301 to 20240305. New scheam captured more sessions and events than the old schema, but within 2.5% variance. | ||
agent type | PASS | {F42562439} | {F42562448} | |
edit count bucket | ❓ Is it expected that for logged-out users performer.edit_count_bucket is NULL in new schema, while in old schema, event.editCountBucket is '0 edits'. | For logged-in users, editcountbucket difference is within 2.5% variance. For logged-out users, performer.edit_count_bucket is NULL in new schema, while in old schema, event.editCountBucket is '0 edits'. Need to confirm whether it is expected. | ||
pageNamespace | ❌ | page.namespace_id is NULL in new schema | ||
viewportSizeBucket | ❓ | diff is within 2.5% variance. new schema captured 2620 NULL viewportsizebucket with skin minerva . To check with engineer | ||
is_dark_mode_on, | ❓ Is null in old schema expected | The diff is within 2.5% variance. old schema captured some NULLs, while new schema did not. To check with engineer | ||
is_dark_mode_prepared_by_os | ❓ Is null in old schema expected | The diff is within 2.5% variance. old schema captured some NULLs, while new schema did not. To check with engineer | ||
dark_mode_setting | ❓ Is null in old schema expected | The differences in dark_mode_setting being 0, 2, and NULL are within a 2.5% variance. The difference in dark_mode_setting being 1 is larger than 2.5%. Due to the low volume and small absolute difference, we mark it as a pass. | ||
is_full_width | ❓ | The diff is within a 2.5% variance. The old schema captured some NULLs, while new schema did not.The NULL is from anonymous users. To check with engineer | ||
is_media_viewer_enabled | PASS | The difference is within a 2.5% variance | ||
is_page_preview_on | PASS | The difference is within a 2.5% variance | ||
is_pinned | PASS | The difference is within a 2.5% variance | ||
font | ❓ | The diff in font=0,1,2 is within a 2.5% variance.. Some values, like large, null, regular and small, are captured in old schema only. To check with engineer. | ||
action_context, | ❓ is it expected | value is desktop for minerva skin in new schema | ||
is_bot | ❌ | performer.is_bot is NULL in new schema | {F42603014} | {F42603033} |
sample rate | ❌ | incorrect in new schema |
Mar 8 2024
@KSarabia-WMF, thanks for checking. Can you also clarify what's the sample rate for logged-in users?
Mar 7 2024
Mar 6 2024
Hi, @KSarabia-WMF , Can you confirm if below sample rate captured in the new schema is correct?
@kostajh, please see the findings below.
Methodology
We reviewed the distribution of the number of distinct user agents that appear for a given IP address per day on each pilot wiki candidate and the largest wiki enwiki.
We also reviewed the worst-case scenario: the maximum number of the distinct user agents that appear for a given IP address per day across all wikis.
The analysis is limited to anonymous edits committed between 2024-01-01 and 2024-01-31.
Mar 5 2024
@KSarabia-WMF, can you also provide the sample rate of the old schema DesktopWebUIActionsTracking? Thanks.
@KSarabia-WMF, can you also provide the sample rate of the old schema MobileWebUIActionsTracking? Thanks.
Mar 4 2024
Feb 29 2024
HI @ovasileva, please see my investigation summary below.
Feb 28 2024
Thanks for checking on it. Regarding 0.2% discrepancy, it can be marked as PASS given 1) it's within variance range , 2.5% variance for daily events across all wikis, that we defined in Metrics Platform Instrument Migration Data QA Process Description ; 2) the new instrumentation is capturing more unique sessions than old instrumentation.
Feb 26 2024
I'll defer to Jennifer about 2 vs auto. I think it's better to do 2 personally in case these definitions ever change in future this will be more resilient to change.
Feb 13 2024
Migration of desktopwebuiactionstracking schema is ready for QA.
The mobilewebuiactionstracking schema is pending for migration.
Feb 2 2024
Hi, thank you for bringing up and clarifying that.
@phuedx, Here are some findings from my investigation.
Jan 31 2024
Here are the baselines for devices with a viewport larger than 1200px. @ovasileva , let me know if you have any questions.
Preview disable rate (viewport > 1200px)
Metric: Number of unique sessions with preview off (non-default)/ total number of unique initialized sessions (viewport > 1200px).
The following statistics are based on the data collected between Dec. 21, 2023 and Dec. 31, 2023
User type | Daily average | Std |
---|---|---|
Loggedin users | 44.37% | 0.27% |
Anonymous users | 3.65% | 0.12% |
Jan 25 2024
@phuedx, Thanks for resolving all the questions. I will further investigate the remaining question of why the numbers of events, sessions and pages are slightly higher in the new schema. Will bring it up to you when I have more data.
Jan 20 2024
Questions to confirm with engineers
- The number of events, sessions and pages are slightly higher in the new schema. Is it expected?
- Which field is to capture Spider user agent?
- Is access_method captured in agent.client_platform_family in the new schema?
- Please review the field mapping table below and confirm whether all entries are as expected.
Field in old schema | Field in new schema | Value example |
---|---|---|
action | action | scroll-to-top |
action_context | NULL | |
action_source | NULL | |
action_subtype | NULL | |
web_session_id | performer.session_id | e.g. , '2751f1d9e9a0417cbc1x' |
meta.dt | meta.dt | e.g. "2024-01-16T00:17:25.272Z" |
page_id | page.id | 59519 |
access_method | agent.client_platform_family❓ | access_method= 'desktop' ; agent.client_platform_family='desktop_browser' |
is_anon | performer.is_logged_in | true, false. The old schema captures the status of being an anoymous user, while the new schema captures the status of being a loggedin users. |
skin | mediawiki.skin | vector-2022 |
user_agent_map['device_family'] | MISSING ❓ | Spider |
Jan 17 2024
Perhaps the best thing to do here would be to only consider devices with > 1200px for the desktop milestone. What do you think?
Jan 12 2024
I have further investigated the preview disable rate in 1000px-1199px viewport bucket, analyzing it by device families and wikis.
In summary, the high preview disable rate in the 1000px-1199px bucket is influenced primarily by devices in the Mac family, specifically those running Mac OS X with the version details: os_major 10 and os_minor 15.
By device family
Jan 10 2024
Jan 9 2024
@ovasileva, @Jdlrobson, I have reran the analysis using the recent data as we discussed.
Dashboard has been published at https://superset.wikimedia.org/superset/dashboard/p/xgaOAD5rz2A/
Jan 4 2024
Hello @Sj, we only collect data on the viewport size buckets, and these are segmented into six groups. Unfortunately, 1400px is not the threshold to divide groups. I hope the data below still provides insights into how full-width preference varies in each group.
Hi @ovasileva , when I analyzed the data based on the viewport size buckets, I noticed that the preview disable rate in the 1000px-1199px group was significantly higher than in the adjacent bucket groups for both anonymous users and logged-in users. Is there any specific reason for this?
Dec 19 2023
Hi @Sj, we have discussed your questions within the team. The request requires recording the reader's user_id along with the timestamps of their visits and clicks. We don't track such detailed info for readers. The session based stats are the closest approximation we have for readers.
Dec 18 2023
@Sj, please see the broken down of preview, width and media viewer at T346979#9285473.
Dec 14 2023
Summary of data QA for the data collected on Dec 14, 2023
Dec 13 2023
All to-dos are done.
Dec 11 2023
Dec 5 2023
Dec 1 2023
@ovasileva ,please see the analysis of pin rate and overall non-default rate below.
Methodology
Nov 29 2023
Nov 9 2023
@ovasileva , here is the baseline collection for font size on mobile web. Let me know if you have any questions.
Summary
% of pageview sessions which have set a non-default font size in the Minerva skin (On mobile web)
Metric: Number of unique sessions with regular font size disabled (non-default) / total number of unique initialized sessions
The following statistics are based on the data collected between Nov.4, 2023 and Nov.8 , 2023 (incomplete date) ,
user type | min | max | avg | std |
---|---|---|---|---|
Logged-in users | 1.18% | 1.28% | 1.24% | 0.04% |
Anonymous users | 0.0139% | 0.0147% | 0.0144% | 0.0003% |
Summary of data QA for the data collected on Nov 8, 2023
Minerva (Mobile)
Schema: event.MobileWebUIActionsTracking
Instrumentation purpose: collect baseline for % of pageviews which have set a non-default font size in the Minerva skin (On mobile web)