Background
We have the overall stats https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/mobile_apps_uniques – wmf.mobile_apps_uniques_daily (DAU) and wmf.mobile_apps_uniques_monthly (MAU) – and @Tbayer manually maintains a tbayer.mobile_apps_uniques_by_country – which uses a per-country DAU query (see Appendix) based on Analytics Engineering's global DAU query – but we have no productionized per-country MAU or DAU (which would be used to calculate the two apps' "stickiness %" in different markets).
While I can run a query that calculates per-country DAU for the past 2 months (using 60 days of request data that we have) and use Tilman's data as part of my work on T184089, it would be tremendously helpful to have that data automated and available in wmf database so that we could track longer term trends.
End goal
- wmf.mobile_apps_uniques_daily_by_country
- wmf.mobile_apps_uniques_monthly_by_country
Please and thank you!
Optional
Something to think about and maybe have additional tickets for:
- If this can be made available in Druid, it would be great if in addition to country names there were also regions, continents, and maybe primary-language-in-country data so that any dashboards we make in Superset could use a meaningful grouping of countries. I'm sure @Charlotte @JMinor and @JKatzWMF would appreciate having that :)
- Some mobile app makers calculate MAU on a daily basis by using a rolling 30-day window. It might be too computationally intensive for us, though.
- Maybe we should also have a DOD retention metric (global & by country)? From https://www.kumulos.com/2016/11/22/mau-vs-dau-measure-mobile-app-retention/:
What is day-over-day (DOD) retention for measuring mobile app performance?
DOD retention is much like the MAU/DAU ratio, but provides a detailed snapshot metric. With DOD retention analysis of mobile app user behavior, you simply establish the ratio of the app users who are active today and yesterday, relative to the number of app users who were only active yesterday. This retention measurement ratio symbolizes the proportion of users who are liable to use your client’s app regularly in a short timeframe. In order to calculate this ratio, you establish the number of users who were active yesterday and today, and divide by the total number of users who were only active yesterday.
(Although there is MobileWikiAppDailyStats EL schema that we can use to estimate this.)
Appendix
Tilman uses the following bash script to maintain the per-country DAU table (adapted from the existing global query)::
#! /bin/bash DESCRIPTION="update mobile apps uniques by country" QUERY=" SET mapred.job.queue.name=nice; -- Set compression codec to gzip to provide asked format SET hive.exec.compress.output=true; SET mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec; CREATE EXTERNAL TABLE IF NOT EXISTS tbayer.mobile_apps_uniques_by_country ( \`year\` int COMMENT 'Unpadded year of request', \`month\` int COMMENT 'Unpadded month of request', \`day\` int COMMENT 'Unpadded day of request', \`platform\` string COMMENT 'Mobile platform from user agent parsing', \`country\` string COMMENT 'Country name of the accessing agents (computed using the MaxMind GeoIP database)', \`country_code\` string COMMENT '2-letter country code (based on ISO 3166)', \`unique_count\` bigint COMMENT 'Distinct uuid count' ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' LOCATION '/user/tbayer/mobile_apps_uniques_by_country' ; WITH mobile_apps_uuids_new AS ( SELECT year, month, day, CASE WHEN (user_agent LIKE '%iOS%' OR user_agent LIKE '%iPhone%') THEN 'iOS' ELSE 'Android' END AS platform, geocoded_data['country'] AS country, geocoded_data['country_code'] AS country_code, COALESCE(x_analytics_map['wmfuuid'], parse_url(concat('http://bla.org/woo/', uri_query), 'QUERY', 'appInstallID')) AS uuid FROM wmf.webrequest WHERE user_agent LIKE('WikipediaApp%') AND ((parse_url(concat('http://bla.org/woo/', uri_query), 'QUERY', 'action') = 'mobileview' AND uri_path == '/w/api.php') OR (uri_path LIKE '/api/rest_v1%' AND uri_query == '')) AND COALESCE(x_analytics_map['wmfuuid'], parse_url(concat('http://bla.org/woo/', uri_query), 'QUERY', 'appInstallID')) IS NOT NULL AND webrequest_source IN ('text') AND year=2018 AND month=1 ) INSERT INTO TABLE tbayer.mobile_apps_uniques_by_country SELECT year, month, day, platform, country, country_code, unique_count FROM ( SELECT year, month, day, platform, country, country_code, COUNT(DISTINCT uuid) AS unique_count FROM mobile_apps_uuids_new GROUP BY year, month, day, platform, country, country_code ) old_union_new_uniques_daily ORDER BY year, month, day, platform, country, country_code -- Limit enforced by hive strict mapreduce setting. -- 1000000000 == NO LIMIT ! LIMIT 1000000000 ; " THISSCRIPTFILE=`basename "$0"` RESULTSFILE=~/${THISSCRIPTFILE%.*}_result.txt { date ; echo = ; TZ='America/Los_Angeles' date ; echo generated by $THISSCRIPTFILE on $HOSTNAME ; beeline --verbose=true -e "$QUERY" ; } &> $RESULTSFILE cat $RESULTSFILE | mailx -s "$DESCRIPTION (Hive result from $HOSTNAME)" tbayer@wikimedia.org