Page MenuHomePhabricator

Determine number of "Printable version" clicks per day
Closed, ResolvedPublic

Description

Count the daily requests with 'printable=yes' URLs (normally reached by clicking "Printable version" in the left sidebar on desktop)

Event Timeline

Answer: 398k/day during the week from May 22-28, excluding spiders. Very roughly, that corresponds to about [edit: fixed typo] 0.08% of our total pageviews (although of course a printed-out Wikipedia article is likely to be associated with vastly more reader attention / content consumption that one pageview in a web browser).

SELECT COUNT(*) AS printviews
FROM wmf.webrequest
WHERE
agent_type = 'user'
AND uri_query LIKE '%printable=yes%'
AND year = 2017 AND month = 5 AND day >= 22 AND day <= 28;

printviews
2787554
1 row selected (794.559 seconds)

PS: Out of curiosity, I also looked at the numbers per country for that timespan. The differences are intriguing (e.g. Kenya, as the only African country besides South Africa in this top 50 list, had by far the highest print version click ratio, and China and Cambodia are also on the high end); but one would need a more thorough examination to determine if there are really persistent geographical differences in how often readers use this feature (on could start with comparing Global South with Global North, and looking at a longer timespan).

SELECT country, SUM(partprintrequests) AS printrequests,
ROUND(100*SUM(partprintrequests)/SUM(partpvs),2) AS printpercentage
FROM (
  SELECT geocoded_data['country'] AS country,
  SUM(IF(uri_query LIKE '%printable=yes%',1,0)) AS partprintrequests,
  SUM(IF(is_pageview,1,0)) AS partpvs
  FROM wmf.webrequest
  WHERE year = 2017 AND month = 5 AND day >= 22 AND day <= 28
  AND agent_type = 'user'
  GROUP BY geocoded_data) AS gpbygeod
GROUP BY country ORDER BY printrequests DESC LIMIT 50;
countryprintrequestsprintpercentage
United States11585010.14
Kenya2315335.97
Germany2271230.1
United Kingdom1831390.1
China1006210.38
Spain835390.12
Bulgaria781550.8
Netherlands660080.14
Canada613650.06
France575280.04
India538590.04
Brazil334760.04
Japan311720.01
Iran311260.04
Australia291370.05
Russia264720.02
Italy230170.02
Slovenia205290.49
Vietnam203950.12
Hong Kong179340.07
Ukraine159710.04
Mexico158870.02
Switzerland135720.06
South Africa105240.09
Indonesia102220.03
Taiwan92810.02
Colombia78350.03
Sweden71090.02
Thailand70650.04
New Zealand70600.07
Romania67670.04
Argentina64940.02
Estonia62920.16
Israel60630.03
Denmark56740.05
Republic of Lithuania55680.12
Cambodia53290.36
Poland52220.01
Slovak Republic44310.07
Norway42550.03
Peru42450.02
Singapore40950.03
Chile38120.02
Malaysia36510.02
United Arab Emirates35610.04
Republic of Korea34370.01
Hungary34270.03
Belgium33690.02
Ireland32400.01
Austria28560.01
50 rows selected (12161.217 seconds)

That seems a particular bad week to analyse, due to the roll out of T24256: Change printable link to JavaScript `print()`. We should probably check for the week BEFORE and the week AFTER that particular week.

@TheDJ Good point, I guess @ovasileva and I weren't aware of that context. Running a query for daily numbers during the time from May 15 to June 4 now.

Here is a daily graph for the time from May 15 to June 4. When exactly did T24256 roll out - on May 28?

printableyes clicks May-June 2017.png (563×740 px, 49 KB)

(Via SWAP notebook, cp ~tbayer/printable=yes%20clicks\ May-June\ 2017.ipynb .)

@Tbayer partly on 23 and 24th for the smaller stuff, and the big wikipedias on june 1st it seems: https://www.mediawiki.org/wiki/MediaWiki_1.30/Roadmap (1.30.0-wmf.2)

Not really a significant influence it seems so far I guess. But there is a lot of variance there in that graph.. A lot more than I would have anticipated (maybe it correlates with the overall traffic those days. Any way, it's anywhere between 0.01-0.15 % of pageviews if I'm basing it on 650 million daily average of pageviews (incl. robots, spiders etc, all projects)

OK, here is an extension of the previous graph, now covering April 4-July 2:

printable yes clicks April-July 2017.png (446×585 px, 40 KB)

Looking at this fuller picture, it seems like there is a baseline of about 400-500k/day (which, again, would correspond to about 0.08 to 0.09% of the ca. 530 million daily non-spider pageviews) , and various spikes on top of it.

BTW I also checked that all those requests are actually human (more precisely, 100.0% had agent_type = 'user' on each day from June 25 to July 2).

Data sources: see previously mentioned SWAP notebook

Closing this now; feel free to reopen in case there are further questions.

@ovasileva - to make instrumentation card for print styles

It's weird, I really had expected to see a big drop somewhere due to more JS prints.... I'm amazed that it's not there. I can't explain it.