Page MenuHomePhabricator

Calculate Android app daily active users from Nigeria
Closed, ResolvedPublic

Description

Based on the existing query for global DAU, restricting it to webrequests from that country, for the timespan where data is still available

Purpose: The New Readers team wants to asses the impact of a campaign in Nigeria that launched September 20, with active promotions running for 6 weeks

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 16 2017, 12:26 AM
Tbayer moved this task from Triage to Next Up on the Reading-analysis board.Nov 18 2017, 6:30 AM

The rough takeaway from the chart below: Right after the launch of the campaign on September 20, the app's daily active user numbers in this country increased by about 800 (i.e. almost half of the existing baseline), and stayed roughly at that level until its end (around Nov 1), after which they dropped again somewhat, but so far retaining more than half of that gain.

Tbayer closed this task as Resolved.Dec 13 2017, 7:49 AM
Tbayer added subscribers: JKatzWMF, Charlotte.

And I took this opportunity to create a general DAUby-country dataset (for all countries and also for opted-in iOS users), calculated in the same way as the global one. For now it can be found at tbayer.mobile_apps_uniques_by_country in Hive, and I intend to run updates manually every month or so.

For reference, the query to generate/update this dataset (adapted from the existing global query):

SET mapred.job.queue.name=nice;
-- Set compression codec to gzip to provide asked format
SET hive.exec.compress.output=true;
SET mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec;

CREATE EXTERNAL TABLE IF NOT EXISTS tbayer.mobile_apps_uniques_by_country
(
    `year`                 int     COMMENT 'Unpadded year of request',
    `month`                int     COMMENT 'Unpadded month of request',
    `day`                  int     COMMENT 'Unpadded day of request',
    `platform`             string  COMMENT 'Mobile platform from user agent parsing',
    `country`              string  COMMENT 'Country name of the accessing agents (computed using the MaxMind GeoIP database)',
    `country_code`         string  COMMENT '2-letter country code (based on ISO 3166)',
    `unique_count`         bigint  COMMENT 'Distinct uuid count'
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
LOCATION '/user/tbayer/mobile_apps_uniques_by_country'
;

WITH mobile_apps_uuids_new AS
(
    SELECT
        year,
        month,
        day,
        CASE WHEN (user_agent LIKE '%iOS%' OR user_agent LIKE '%iPhone%') THEN 'iOS' ELSE 'Android' END AS platform,
        geocoded_data['country'] AS country,
        geocoded_data['country_code'] AS country_code,
        COALESCE(x_analytics_map['wmfuuid'],
                 parse_url(concat('http://bla.org/woo/', uri_query), 'QUERY', 'appInstallID')) AS uuid
    FROM wmf.webrequest
    WHERE user_agent LIKE('WikipediaApp%')
        AND ((parse_url(concat('http://bla.org/woo/', uri_query), 'QUERY', 'action') = 'mobileview' AND uri_path == '/w/api.php')
            OR (uri_path LIKE '/api/rest_v1%' AND uri_query == ''))
        AND COALESCE(x_analytics_map['wmfuuid'],
                     parse_url(concat('http://bla.org/woo/', uri_query), 'QUERY', 'appInstallID')) IS NOT NULL
        AND webrequest_source IN ('text')
        AND year=2017
        AND month=11
)
INSERT INTO TABLE tbayer.mobile_apps_uniques_by_country
SELECT
    year,
    month,
    day,
    platform,
    country,
    country_code,
    unique_count
FROM
    (
        SELECT
            year,
            month,
            day,
            platform,
            country,
            country_code,
            COUNT(DISTINCT uuid) AS unique_count
        FROM
            mobile_apps_uuids_new
        GROUP BY
            year,
            month,
            day,
            platform,
            country, country_code
    ) old_union_new_uniques_daily
ORDER BY
    year,
    month,
    day,
    platform,
    country, country_code
-- Limit enforced by hive strict mapreduce setting.
-- 1000000000 == NO LIMIT !
LIMIT 1000000000
;