As the webrequest data from November is going to be purged soon, we want to aggregate and save the Google Translate Pageview count temporarily in @chelsyx 's database. This should be a one-off solution, and this data is going to be transformed and moved to production database until we figure out a long term solution.
The query we're going to use is based on @dr0ptp4kt 's trend query:
```lang=sql, lines=5
select
count(1) as count,
year,
month,
day,
http_method,
http_status,
uri_host,
agent_type,
access_method,
parse_url(referer, 'HOST') as referer_host,
referer_class,
parse_url(referer, 'QUERY') like '%client=srp%' as client_srp,
regexp_extract(parse_url(referer, 'QUERY'), '(^|[&?])hl=([^&]*)', 2) as home_language,
regexp_extract(parse_url(referer, 'QUERY'), '(^|[&?])sl=([^&]*)', 2) as source_language,
regexp_extract(parse_url(referer, 'QUERY'), '(^|[&?])tl=([^&]*)', 2) as to_language,
regexp_extract(parse_url(referer, 'QUERY'), '(^|[&?])rurl=([^&]*)', 2) as rurl_param,
geocoded_data['continent'] AS continent,
geocoded_data['country_code'] AS country_code,
geocoded_data['country'] AS country
from wmf.webrequest
where
year = 2019
and month = 2
and is_pageview
and x_analytics_map['translationengine'] = 'GT'
group by
year, month, day,
http_method,
http_status,
uri_host,
agent_type,
access_method,
parse_url(referer, 'HOST'),
referer_class,
parse_url(referer, 'QUERY') like '%client=srp%',
regexp_extract(parse_url(referer, 'QUERY'), '(^|[&?])hl=([^&]*)', 2),
regexp_extract(parse_url(referer, 'QUERY'), '(^|[&?])sl=([^&]*)', 2),
regexp_extract(parse_url(referer, 'QUERY'), '(^|[&?])tl=([^&]*)', 2),
regexp_extract(parse_url(referer, 'QUERY'), '(^|[&?])rurl=([^&]*)', 2),
geocoded_data['continent'],
geocoded_data['country_code'],
geocoded_data['country']
ORDER BY
year, month, day
-- Limit enforced by hive strict mapreduce setting.
-- 1000000000 == NO LIMIT !
LIMIT 1000000000
;
```