Page MenuHomePhabricator

Test collection/analysis of device data for the CampaignEvents registration tool
Closed, ResolvedPublic

Description

Pull and review data on devices using the CampaignEvents registration tool.
Pull all data (editors/organizers) to start.

see related task: T336361 (specifically this chart output)

SELECT
    DISTINCT agent_type

    user_agent_map.os_family = 'Android'
    OR user_agent_map.os_family = 'iOS'
    OR user_agent_map.os_family = 'KaiOS'

FROM 
    wmf.[[ https://datahub.wikimedia.org/dataset/urn:li:dataset:(urn:li:dataPlatform:hive,wmf.webrequest,PROD)/Schema?is_lineage_mode=false&schemaFilter= | webrequest ]]

WHERE
    year = 2024
    AND month = 1
    AND day = 1
    AND normalized_host == 'metawiki'
    AND agent_type == 'user' -- T336361#9148614
GROUP BY
    os_family,
    device_family
ORDER BY
    total_requests DESC

Alternate method from similar Apps team analysis:

test = '''
select
  year, month, day,
  user_agent_map['os_family'] as operating_system,
  element_at(x_analytics_map, 'wmfuuid') is not null as is_sharing_data,
  count(1) as n_requests
from wmf.webrequest 
where webrequest_source = 'text'
  AND year = {year} and month = {month} and day = {day}
  AND agent_type == 'user' -- T336361#9148614 
  AND uri_host = 'meta.wikimedia.org'
  AND uri_path = 'wiki/Event:'
group by
  year, month, day,
  user_agent_map['os_family'], 
  element_at(x_analytics_map, 'wmfuuid') is not null
'''

start_date = date(2024, 5, 1)
end_date = date(2024, 5, 2)

results = list()

for this_day in pd.date_range(start_date, end_date):
        results.append(wmf.spark.run(test.format(year = this_day.year, month = this_day.month, day = this_day.day)))
        print('Retrieved data for {year} - {month} - {day}'.format(year = this_day.year, month = this_day.month, day = this_day.day))
        
json_reqs_by_platform = pd.concat(results)

Event Timeline

recommendation: use os_family data (not device_family) when event special pages are added to the pageview whitelist.

Discussed this with @ifried today including looking at sample data previously gathered by another team (specifically this chart output).

Conclusion from the discussion: OS family data is likely more helpful than device family data. That said, device data is not very helpful at present. We do not have the pipelines set up to collect this data efficiently and we will not proceed at this time. In the future, if the Campaigns-Product team needs this data, and they may need it once or twice a year, they will proceed with a specific, new request.

Iflorez triaged this task as Medium priority.