Page MenuHomePhabricator

collect banner impressions for WLM2017 and see if counts are lower than expected
Closed, DeclinedPublic

Description

In order to find out if there less banner impressions in 2017 then in previous years (perhaps in some countries).

Here are counts for first 9 days of September, using https://phabricator.wikimedia.org/P4040 as reference for how to get banner impressions from webrequest logs, with follow-up analysis

Related Objects

Event Timeline

Introduction:
Not knowing of P4040 I first ran code to count image impressions of the thumbnail in the banner.
Later when Leila pointed to P4040 I ran those queries (again for test case South Korea) and compared with my thumb counts: they matched quite well.

I also added unique ip counts, and access method. These latter only gave desktop and app counts for the thumb (apparently the mobile banner uses a different version of the image, although the images seem similar).

I also added unique ip addresses. Note that uniques is much lower than total requests

So here are results with comments:

1) Collect total page views for one hour from Rep. of Korea

SELECT
 sum(view_count),
 access_method
FROM
 wmf.pageview_hourly
 WHERE
year = 2017
AND month = 9
AND day = 2
AND hour = 14
AND country_code = "KR"
AND project = "ko.wikipedia"
AND agent_type = "user"
AND NOT (page_title LIKE "%Special%")
GROUP BY
access_method  
ORDER BY
access_method
LIMIT 10000 ;`  
-> 
2017/09/02 14h: 
34951   desktop
  871   mobile app
71531   mobile web

2) Collect banner impressions, again for one hour and country KO only

SELECT
    count(*),
    count (distinct ip) as ip_count,
    access_method
FROM
    wmf.webrequest
WHERE
    year = 2017
    AND month = 9
    AND day = 2
    AND hour = 14
    AND uri_host LIKE "%meta%"
    AND geocoded_data["country_code"]="KR"
    AND agent_type = "user"
    AND referer LIKE "%wikipedia%"
    AND uri_query LIKE "%BannerLoader%"
    AND uri_query LIKE "%wlm_2017%"
    AND uri_query LIKE "%uselang=ko%"
GROUP BY
    access_method
ORDER BY
    access_method
LIMIT 10000 ;`
  ->
 2017/09/02 14h: 
 total     unique  platform
  83110   41495	desktop    
  54529   28590	mobile web

3) Count WLM thumbnail requests, again for one hour and country KO only

FROM
    wmf.webrequest
WHERE
    webrequest_source='upload'
    AND uri_host='upload.wikimedia.org'
    AND uri_path LIKE '%77px%rebro_slott_resynth.jpg%'
    AND year = 2017
    AND month = 9
    AND day = 2
    AND hour = 14
    AND geocoded_data["country_code"]="KR"
    AND referer LIKE "%wikipedia%"
    AND agent_type = "user"
GROUP BY
    access_method
ORDER BY
    access_method
LIMIT 10000 ;`
-> 
   2017/09/02 14h: 
   total     unique  platform
   47769   42276   desktop

  3) yields pretty similar results as 2)
  Compare 42276 unique image downloads from 3) with 41495 banner impressions from 2) 
  The count is higher as restriction 'uri_query LIKE "%uselang=ko%"' is missing

4) Collect banner clicks, again for one hour and country KO only

SELECT
    count(*),
    count (distinct ip) as ip_count,
    access_method
FROM
    wmf.webrequest
WHERE
    year = 2017
    AND month = 9
    AND day = 2
    AND hour = 14
    AND uri_host LIKE "ko%wikipedia.org"
    AND geocoded_data["country_code"]="KR"
    AND agent_type = "user"
    AND referer LIKE "%wikipedia%"
    AND uri_path = "/beacon/impression"
    AND uri_query LIKE "%wlm_2017%"
    AND uri_query LIKE "%uselang=ko%"
GROUP BY
    access_method
ORDER BY
    access_method
LIMIT 10000 ;`
-> 
2017/09/02 14h: 
total    unique  platform
320     305       desktop
574     554       mobile web

5) Breakdown by country by day by platform for first 9 days

USE wmf ;
SELECT
    s.*,
    count(*) AS count
FROM
  (SELECT
       access_method,
       geocoded_data['country'] country,
       day
   FROM
       webrequest
   WHERE
           year=2017
       AND month=9
       AND day < 9
       AND uri_host LIKE "%meta%"
       AND agent_type = "user"
       AND referer LIKE "%wikipedia%"
       AND uri_query LIKE "%BannerLoader%"
       AND uri_query LIKE "%wlm_2017%"
     ) s
GROUP BY
    s.country,
    s.access_method,
    s.day
  ORDER BY
    s.country,
    s.access_method,
    s.day
LIMIT 1000000 ;

result:

Aklapper added a subscriber: ezachte.

Removing assignee @ezachte as that Phabricator account has been deactivated. (If there are questions, it seems that @erik_zachte could be contacted.)

Folks, I declined this task as we didn't look at it back in 2017 and the data for looking at it doesn't exist any longer. If we decide that we want it for 2020 and afterwards, we can create a new task for it.