Page MenuHomePhabricator

Define and implement instrumentation for printing on desktop web
Closed, ResolvedPublic8 Story Points

Description

Background

Prior to changing to our new print styles, we would like to establish a baseline of the number of users printing on desktop. We would like to look at the following questions:

  • How many users are printing per session?
  • How many users are printing multiple times per session?

Based on prior data, there is probably going to be a substantial amount of daily variation in the data, and also, printing activity may be affected by seasonal changes. So this new data should not be used for any day to day comparisons. But it will give us a better idea of how our print feature is being used.

AC

  • Create a schema with the following:

Events (which are referred to as actions in the Popups schema).

  • Clicks on the "printable version" link.
  • Browser print events (where browser supports it), in form of onbeforeprint

Properties (for all events):

  • sessionToken ( = mw.user.sessionId())
  • isAnon
  • pageTitle ( = mw.config.get( 'wgPageName' ))
  • namespaceId ( = mw.config.get( 'wgNamespaceNumber'))
  • skin ( = mw.config.get( 'skin'))
  • The instrumentation will sample by user session and not by pageview.
  • To start with, the sampling rate is 10% of all distinct browser sessions.

Notes

  • We are not considering users without JavaScript.
  • We cannot tell if the user actually abandons the print, only that they clicked. For instance, various factors including how the print styles look may actually lead to a user abandoning a print, but we will have no idea about that.
  • @bmansurov found that Opera would trigger the beforeprint event twice (see T171162#3457776). Ensure that it's only handled once.
  • Define the instrumentation in the WikimediaEvents extension as Wikimedia-specific and skin-agnostic.

Closed Questions

  • Are we tracking prints for all skins or just Vector?

@phuedx: By listening to the the onbeforeprint, we should be able to instrument all skins. I've added the skin property to the event above.

  • What is the sampling rate?

@phuedx: 0.1% of all distinct browser sessions seems to be a sensible default. This yields a peak rate of 5 events per second for the NavigationTiming instrumentation, for example.

@Tbayer: Per T169730#3475086, 10% of all distinct browser sessions.

  • Can we provide a list of browsers which support detecting whether a user has entered print mode?

@phuedx: Per T171162#3457776, listening to the onbeforeprint event with a matchMedia API fallback should yield the following coverage:

  1. IE6+
  2. Edge 12+
  3. Firefox for Android 54
  4. Firefox 6+
  5. Chrome 9+
  6. Safari 5.1+
  7. Opera 12.1+ (though @bmansurov's testing revises this to Opera 15+)
  8. Opera Mini
  9. Android Browser 3+

Which I think equates to very nearly all Grade A and Grade C browsers.

See http://caniuse.com/#feat=beforeafterprint and http://caniuse.com/#feat=matchmedia for coverage of the primary and fallback approaches.

Next

In the longer term we will want to consider the following questions but these should be considered out of scope for this task - please see T169731:

  • Is the overall number of users who print multiple times per session decreasing or increasing after the deployment of the new print styles? (Will likely require an A/B test but that is out of the scope of this task)
  • Can we bucket by session for an actual A/B test?

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
phuedx updated the task description. (Show Details)Jul 25 2017, 9:22 AM
phuedx updated the task description. (Show Details)Jul 25 2017, 9:34 AM
phuedx updated the task description. (Show Details)Jul 25 2017, 9:39 AM

OK. I think I've folded in everything that @bmansurov found out in T171162: [Spike] Answer open questions on instrumentation for printing on desktop web. Let's see if the estimate shrinks a little!

phuedx updated the task description. (Show Details)Jul 25 2017, 9:40 AM

We re-estimated this and again landed on an 8. The reason for the 8 and not a 5 is that we think there is some risk relating to sign off to avoid the issues we had with Popups. We want to capture testing and verifying fixes into the score, however thanks to @bmansurov we think this is a much better defined 8 and the risk has been mitigated somewhat!

Further to @Jdlrobson's explanation of our estimate above, we all agreed that, amongst other things, we must invest more effort in QA'ing instrumentation prior to merging/signing off. Also, given the number of browsers that we're hoping to support, testing this will require a concerted effort.

@phuedx: 0.1% of all distinct browser sessions seems to be a sensible default. This yields a peak rate of 5 events per second for the NavigationTiming instrumentation, for example.

But that's not a good comparison, considering that this schema will only send events for a fraction of all sampled sessions (those where a print action occurs). Per T167237#3402915 , the "printable version" link is clicked on less than 1/1000 of pageviews, i.e. a sampling rate of 100% would get us less than 6 of these events per second on average ;) Regarding our other planned event (browser print actions), we don't have an idea about their frequency yet - that's part of the goal here - but let's assume for now that they occur on less than 1/100 of pageviews. Altogether, assuming that these ratios hold through session sampling, a sampling rate of 10% should still result in an average event rate that's safely below the 10/second (average, not peak) that the Analytics Engineering team currently states as the limit above which one should check with them.

phuedx updated the task description. (Show Details)Jul 26 2017, 3:57 PM

I stand corrected!

Jdlrobson renamed this task from Define instrumentation for printing on desktop web to Define and implement instrumentation for printing on desktop web.Jul 26 2017, 5:38 PM
Jdlrobson removed Jdlrobson as the assignee of this task.

Adding to sprint, as print styles work is in signoff, as agreed during kickoff.

So this is not lost: We cautiously moved this back to "upcoming" as nobody is currently free to work on this and the print styles work is not done.. there could be follow up work in sign off (T169823) and Baha's patch (T169826) is still in flight.

ovasileva moved this task from Triage to Backlog on the Proton board.Aug 30 2017, 5:21 PM
bmansurov moved this task from To Do to Doing on the Readers-Web-Kanbanana-Board-Old board.

@Tbayer could you take a look at the schema: https://meta.wikimedia.org/wiki/Schema:Print

Please feel free to update as needed. Thanks!

@Tbayer, since some browsers trigger onbeforeprint for each time a preview is rendered, do you think it would be wise to send this event only once per page view? So no matter how many times a user prints a page, this event will be sent only once. Or do we want to capture all such events for a given page when the user prints the page multiple times without reloading?

Similarly, with regards to the user clicking on the printable version link, do we want to capture all such clicks for a given page? Or is it enough to capture only the first such click?

Tbayer updated the task description. (Show Details)Sep 8 2017, 1:05 AM

@Tbayer, since some browsers trigger onbeforeprint for each time a preview is rendered, do you think it would be wise to send this event only once per page view? So no matter how many times a user prints a page, this event will be sent only once. Or do we want to capture all such events for a given page when the user prints the page multiple times without reloading?

I think we can be pragmatic about this and choose whichever is easier to implement. (It doesn't seem to be a very important product question.) If we we go with the second option and log multiple events during one pageview, we will be able to connect them using pageTitle and namespaceId.

Similarly, with regards to the user clicking on the printable version link, do we want to capture all such clicks for a given page? Or is it enough to capture only the first such click?

Ditto, it should be fine to do whichever is easier to implement

BTW, it would be good to document the eventual choice in the schema page.

@Tbayer could you take a look at the schema: https://meta.wikimedia.org/wiki/Schema:Print
Please feel free to update as needed. Thanks!

Looks good to me!

I started the schema documentation on the talk page, please check and make edits if necessary (wasn't sure which project to attach): https://meta.wikimedia.org/wiki/Schema_talk:Print
@ovasileva, I listed you and Baha as maintainers, let me know in case you prefer to instead include me or someone else from the team.

Once it is up and running, we also need to whitelist it, I'm adding a subtask for that.

Tbayer updated the task description. (Show Details)Sep 8 2017, 1:42 AM

@Tbayer could you take a look at the schema: https://meta.wikimedia.org/wiki/Schema:Print
Please feel free to update as needed. Thanks!

Looks good to me!
I started the schema documentation on the talk page, please check and make edits if necessary (wasn't sure which project to attach): https://meta.wikimedia.org/wiki/Schema_talk:Print
@ovasileva, I listed you and Baha as maintainers, let me know in case you prefer to instead include me or someone else from the team.
Once it is up and running, we also need to whitelist it, I'm adding a subtask for that.

sounds good. Thank you @Tbayer!

@Tbayer thanks! I've updated the schema talk page. I decided to log events only once because it's easier to implement and less headache for dealing with duplicate events.

  • Implement the schema's purging strategy by submitting a patch to the whitelist or filing a task with Analytics Engineering.

@Tbayer what should be the strategy? The default is 90 days, and no change is needed if we want to keep it.

  • Implement the schema's purging strategy by submitting a patch to the whitelist or filing a task with Analytics Engineering.

@Tbayer what should be the strategy? The default is 90 days, and no change is needed if we want to keep it.

I already specified one in the talk page template - I think we should keep the non-sensitive data for later reference (like we do for the Popups schema).

bmansurov updated the task description. (Show Details)Sep 8 2017, 6:23 PM

I've created T175395: Implement Schema:Print purging strategy and removed the requirement from this task.

ovasileva moved this task from Backlog to Current Sprint on the Proton board.Sep 11 2017, 11:57 AM
MBinder_WMF reassigned this task from bmansurov to pmiazga.Sep 13 2017, 5:10 PM

@Tbayer I've set up event logging for print at http://reading-web-staging.wmflabs.org/wiki/Main_Page. The sampling rate is set to 90% for easy bucketing. Please let me know if you need anything else for testing.

Change 376427 had a related patch set uploaded (by Bmansurov; owner: Bmansurov):
[mediawiki/extensions/WikimediaEvents@master] Add support for Schema:Print

https://gerrit.wikimedia.org/r/376427

pmiazga removed pmiazga as the assignee of this task.Sep 15 2017, 1:50 PM
pmiazga added a subscriber: pmiazga.

@Tbayer can you verify this works properly?

Change 376427 merged by jenkins-bot:
[mediawiki/extensions/WikimediaEvents@master] Add support for Schema:Print

https://gerrit.wikimedia.org/r/376427

Ping @Tbayer. Following on from T169730#3611297 and T169730#3605628, should I set up a meeting with you and @bmansurov to go through the instrumentation on the staging server?

Tbayer moved this task from Triage to In progress on the Reading-analysis board.
Tbayer added a comment.EditedSep 20 2017, 2:15 PM

Thanks @phuedx - since @bmansurov and I already had a call about this earlier (with testing on his local installation), I did some checks on reading-web-staging my own already.

It successfully sent clickPrintableVersion events from Firefox and Chromium under Ubuntu from various pages. I also checked the value of event_isAnon after logging in, and tried generating an event on a non-mainspace (File:..) page.

Under Chromium, the onBeforePrint events worked too, but I wasn't able to generate them in Firefox (via File --> Print). Looking at the EL table, Baha did actually generate onBeforePrints under Firefox too on his local installation, albeit only for Special:Book. Could someone look into this?

With an eye on T175918, I checked that events were being sent on three consecutive pageviews in the same session. They were, although that's of course not yet a guarantee that that bug won't affect this schema as well.

BTW, at first it did not send events for me at all in Firefox when using private browsing (even when bucketed and with DNT disabled). It look like this was due the new(ish) additional tracking protection: https://support.mozilla.org/en-US/kb/tracking-protection-pbm After deactivating it in the Firefox settings, events were sent.

For the record, the events I generated can be found via SELECT * FROM log.Print_17199246 WHERE userAgent LIKE '%Ubuntu%';
`

bmansurov moved this task from Needs QA to Doing on the Readers-Web-Kanbanana-Board-Old board.

I'll take a look at the Firefox issue.

bmansurov reassigned this task from bmansurov to Tbayer.Sep 20 2017, 2:44 PM
bmansurov moved this task from Doing to Needs QA on the Readers-Web-Kanbanana-Board-Old board.

@Tbayer I've updated the staging server -- sorry, it was out of date. Now I see the onBeforePrint in Firefox.

Tbayer closed this task as Resolved.Sep 20 2017, 11:30 PM

Verified that onBeforePrint is now sent under Firefox too. I assume that everything else still works after the update; it seems due diligence has been done for now.

BTW, it looks like this schema is active on Minerva too - is this intentional?

 SELECT event_skin, COUNT(*) FROM log.Print_17199246 GROUP BY event_skin;
+------------+----------+
| event_skin | COUNT(*) |
+------------+----------+
| apioutput  |       22 |
| minerva    |    18600 |
| modern     |        4 |
| monobook   |       38 |
| vector     |   290994 |
+------------+----------+
5 rows in set (1.56 sec)

Yes, we're instrumenting on all skins: related code.

OK - just wondering if Minerva is considered desktop web in this context.

As explained at T175395#3662739 , including Minerva actually causes some problems with the purging strategy (or, precludes an easy option to resolve these); so I have suggested there to simply restrict the instrumentation to desktop, consistent with the original task description. @ovasileva, let us know in case we actually need data for mobile web too.

ovasileva added a comment.EditedOct 6 2017, 12:28 PM

As explained at T175395#3662739 , including Minerva actually causes some problems with the purging strategy (or, precludes an easy option to resolve these); so I have suggested there to simply restrict the instrumentation to desktop, consistent with the original task description. @ovasileva, let us know in case we actually need data for mobile web too.

@Tbayer - we do. We are changing the usage for the download to PDF option on mobile to have the button trigger the print modal for all Android devices. We would need this schema to be able to track usage there: T177215: Build download button for mobile PDF download

Change 386911 had a related patch set uploaded (by Pmiazga; owner: Pmiazga):
[mediawiki/extensions/WikimediaEvents@master] Limit logged skins for print event only to vector and minerva

https://gerrit.wikimedia.org/r/386911

Change 386911 merged by jenkins-bot:
[mediawiki/extensions/WikimediaEvents@master] Limit logged skins for print event only to vector and minerva

https://gerrit.wikimedia.org/r/386911

@bmansurov BTW, i just noted that Chrome 63 and later seem to now support the beforeprint and afterprint events as well. Apparently this made it into the HTML5 Living standard now.

Thanks for the ping, @TheDJ. Yeah, I forgot to comment that support for the {before,after]print events landed in Chromium on September, 27th: https://bugs.chromium.org/p/chromium/issues/detail?id=218205

Good news. Now, the easy part is left: upgrade our users' Chromes. 😜