Page MenuHomePhabricator
Paste P8259

Breakdown of TextExtracts API parameter combinations for 2019/03/21
ActivePublic

Authored by phuedx on Mar 22 2019, 7:43 PM.
Tags
None
Referenced Files
F28445807: raw.txt
Mar 22 2019, 7:43 PM
extracts_props n
limit|intro|plaintext 9035373
intro 7421959
intro|plaintext 6027851
chars|limit|intro|plaintext 4985313
2564824
plaintext 1326981
sentences|intro|plaintext 1192547
sentences|limit|intro|plaintext 643904
limit|plaintext 550572
sentences|plaintext 503069
sentences|plaintext|sectionformat 431697
plaintext|sectionformat 259910
chars|plaintext 136205
sentences|limit|plaintext|sectionformat 131272
limit|intro 117578
limit 94784
sectionformat 84100
chars 82683
sentences|intro 69315
chars|limit|plaintext|sectionformat 67807
intro|plaintext|sectionformat 59891
chars|plaintext|sectionformat 22072
sentences|limit|plaintext 19405
chars|limit|sectionformat 17751
chars|limit|intro 15157
chars|intro|plaintext 10579
chars|intro 9842
limit|plaintext|sectionformat 7396
sentences 6217
intro|sectionformat 5445
sentences|limit|intro|plaintext|sectionformat 5196
chars|limit|plaintext 4670
sentences|limit|intro 3155
chars|intro|sectionformat 3114
chars|sectionformat 1889
chars|limit 698
limit|intro|plaintext|sectionformat 620
limit|intro|sectionformat 519
sentences|sectionformat 202
chars|limit|intro|sectionformat 182
sentences|intro|plaintext|sectionformat 105
limit|sectionformat 80
sentences|limit 69
chars|limit|intro|plaintext|sectionformat 67
sentences|limit|intro|sectionformat 26
chars|intro|plaintext|sectionformat 22
with dataset as (
select
concat_ws(
'|',
if(uri_query like '%exchars%', 'chars', null),
if(uri_query like '%exsentences%', 'sentences', null),
if(uri_query like '%exlimit%', 'limit', null),
if(uri_query like '%exintro%', 'intro', null),
if(uri_query like '%explaintext%', 'plaintext', null),
if(uri_query like '%exsectionformat%', 'sectionformat', null)
) as extracts_props
from
wmf.webrequest
lateral view
parse_url_tuple(concat('https://', uri_host, uri_path, uri_query), 'QUERY:prop') query as
prop
where
year = 2019
and month = 3
and day = 21
and uri_path = '/w/api.php'
and query.prop like '%extracts%'
)
select
extracts_props,
count(*) as n
from
dataset
group by
extracts_props
order by
n desc
limit 10000
;

Event Timeline

@EvanProdromou, this is a pretty cool API usage query @phuedx wrote that may be applicable to your work.