Page MenuHomePhabricator

QuickSurveys EventLogging missing ~10% of interactions
Open, Needs TriagePublic

Description

Problem

We have run two external QuickSurveys recently (T217080 and T217576) and for both surveys, about 10% of the responses recorded (i.e. surveys filled out via Qualtrics and Google Forms) do not have associated EventLogging data from QuickSurveyInitiation or QuickSurveysResponses. Some of the responses missing QuickSurveysResponses EL do have associated QuickSurveyInitiation EL.

What have we ruled out:

  • Issue with the queries we are using to gather the EL data. The surveys missing EL data are scattered through the whole survey period so it is not related to incorrectly filtering the data.
  • Issues w/ the URL-encoded EL being too long. I only saw six total instances in the event.eventerror table related to the surveys.
  • Respondents tampering with the pageviewToken that we use to connect EL with the survey responses. In Qualtrics, this is not possible and we still see the missing EL.

Hypotheses

  • Missing all EL is possibly related to browser agent. We have observed that certain browsers are being undersampled by QuickSurveys (T218243#5086923). It could be that these browsers are in fact seeing QuickSurveys, they just are not appropriately logging that.
    • It seems other EventLogging schema have run into related challenges (T204143)
  • Just missing QuickSurveysResponses EL (but not QuickSurveyInitiation) is likely related to this bug: T217171#4992112
    • Essentially, right-clicking and opening a quicksurvey link in a new tab is not registered by the extension. Presumably the 91 responses for Reader Trust and 38 for Demographics Pilot that had QuickSurveyInitiation but not QuickSurveysReponses (despite completing the survey) could be the result of this behavior.

Survey Overviews

Reader Trust (T217576):

  • Out of the 1971 survey responses recorded by Qualtrics:
    • 1702 (86%) of those have corresponding QuickSurveysResponses data
    • Another 92 (for a total of 1793 or 91%) can be matched to QuickSurveyInitation data.

Demographics Pilot (T217080):

  • Out of the 626 survey responses recorded by Google Forms:
    • 514 (82%) of those have corresponding QuickSurveysResponses data
    • Another 38 (for a total of 552 or 88%) can be matched to QuickSurveyInitation data.

For these analyses, I fully skipped EventLogging and instead used webrequest logs, using a query like that below to gather the EL (and then attempting to join it to the survey responses provided by Qualtrics/Google Forms):

SELECT *,
       REFLECT(‘java.net.URLDecoder', 'decode', SUBSTR(uri_query, 2)) AS json_event
  FROM wmf.webrequest 
 WHERE uri_path LIKE '%beacon/event'
       AND uri_query LIKE '%QuickSurvey%'
       AND uri_query LIKE ‘%<survey-name>%’ 
       AND year = 2019 AND month = 3 AND day >= 18 AND day < 23

Event Timeline

Isaac created this task.Apr 10 2019, 4:20 PM

Is it possible that the link to the survey is being shared outside a QuickSurvey (e.g. social media)?

Restricted Application added a project: Analytics. · View Herald TranscriptApr 10 2019, 6:20 PM
Isaac added a comment.Apr 10 2019, 6:33 PM

Is it possible that the link to the survey is being shared outside a QuickSurvey (e.g. social media)?

@Jdlrobson Good point but I'm pretty sure not. We haven't seen any evidence of links being shared on social media and if the links were shared, we'd expect a bunch of survey responses with blank or duplicate pageview tokens (because either the person would share the link w/o the URL parameters that pass the unique pageview token to Qualtics/Google Forms or with the URL parameters particular to their session). The responses that don't have EL associated with them, however, have pageview tokens that are unique and look reasonable (i.e. correct length and random-ish string of characters as expected).

fdans triaged this task as High priority.Apr 11 2019, 4:26 PM
fdans raised the priority of this task from High to Unbreak Now!.
fdans moved this task from Incoming to Data Quality on the Analytics board.
Restricted Application added subscribers: Liuxinyu970226, TerraCodes. · View Herald TranscriptApr 11 2019, 4:27 PM
fdans lowered the priority of this task from Unbreak Now! to High.Apr 11 2019, 4:27 PM
fdans raised the priority of this task from High to Unbreak Now!.
fdans moved this task from Data Quality to Ops Week on the Analytics board.
mforns added a subscriber: mforns.Apr 12 2019, 3:47 PM

Hey! :]

I've been looking into this for a bit.
Is there any documentation I can read on the flow of the surveys?
Does the user click on a link on-wiki, that opens a Google/Qualtrics form?
And when do events for QuickSurveyInitiation and QuickSurveysResponses trigger?

The browser stats are mysteriously interesting, I think it's worth digging further into that.
Another condition that could be partial cause of the missing data is disabled JS on the browser, no?
Maybe also, the fact that beacon requests are sent only on the unloading of the page, causes them to be a delayed for a couple days (if people leave tabs open).
I've seen a (short) long tail of QuickSurveyResponses events for reader-demographics-en-pilot that are outside of the suggested time intervals of the experiment.

I found https://www.mediawiki.org/wiki/Extension:QuickSurveys,
and it explains the code for the survey is loaded dynamically, so JS disabled is not the cause.
DNT is also not the cause, because when it's on, the surveys don't even show.

I don't understand yet how Google/Qualtrics forms can send responses and at the same time we get corresponding QuickSurveysResponses events.
Are the external forms configured to also send beacons?

@ovasileva: This might benefit from some investigation on our side too.

Isaac added a comment.Apr 12 2019, 4:29 PM

Is there any documentation I can read on the flow of the surveys? Does the user click on a link on-wiki, that opens a Google/Qualtrics form?

Unfortunately no great documentation that I know of but happy to try to sketch it out. You're right about the click on link on-wiki that opens a Google/Qualtrics form. A few general notes:

  • Regardless of internal vs. external survey, pretty much the same criteria are used to determine whether a given reader will see a survey
  • Each external survey will have a corresponding configuration and message pages that provide the information necessary for sampling as well as what which URL to provide the reader if they click "yes" to take a survey. Depending on the survey, this URL generally takes them either to a Google Form or Qualtrics survey.
  • Importantly, external surveys can be configured to dynamically add a URL parameter to the survey URL that passes that reader's (unique) pageview token to the external survey. For Google Forms, we set it up so that this pageview token automatically is set as the answer to one of the questions. For Qualtrics, this pageview token is just stored alongside the survey.

For the demographics pilot, the flow would be like this:

And when do events for QuickSurveyInitiation and QuickSurveysResponses trigger?

The pageview tokens and our ability to link them up with EventLogging for the demographics survey is here: https://docs.google.com/spreadsheets/d/10s2U1vHGefd6g8Ev4clT4e-MEO5ThXoX--tjVR9GyzM/edit?usp=sharing

I've seen a (short) long tail of QuickSurveyResponses events for reader-demographics-en-pilot that are outside of the suggested time intervals of the experiment.

Yeah, I tried looking into that but I think decided that it only explained potentially a very tiny minority and all of the EL should actually be sent before the user even takes the survey.

And when do events for QuickSurveyInitiation and QuickSurveysResponses trigger?

Unnervingly, this isn't strictly true. Looking at L87 of that same file, there's a test to see if the mw.eventLog property exists. That property is set up in "EventLogging Core" (https://github.com/wikimedia/mediawiki-extensions-EventLogging/blob/1db3013946fc0d451d4f7f3fdd5fd7f17cebae02/modules/ext.eventLogging/core.js#L240). QuickSurveys does require EventLogging but it doesn't require the client-side code to be loaded and executed before its client-side code is, i.e. it's unlikely but not impossible that QuickSurveys could be loaded and executed before EventLogging.

I don't think this explains what you're seeing but I think this is an omission in QuickSurveys' design (and a very easy one to fix at that).

Isaac added a comment.Apr 12 2019, 9:23 PM

i.e. it's unlikely but not impossible that QuickSurveys could be loaded and executed before EventLogging.

Hmm...that would explain a lack of initiation possibly but my (perhaps naive) assumption is that by the time a user clicked on the survey, everything would be properly loaded.

My other current theory is the missing 10% is possibly browsers that don't support sendBeacon (https://developer.mozilla.org/en-US/docs/Web/API/Navigator/sendBeacon#Browser_compatibility), which would potentially block event logging but not QuickSurveys if the backup in EventLogging which is "create an image with the same URL as the beacon" does not work well.

The QuickSurveyInitation does send a value equivalent to !!navigator.sendBeacon here: https://github.com/wikimedia/mediawiki-extensions-QuickSurveys/blob/c40f824c79b4e98993dafe7416a60fd6ad9cce45/resources/ext.quicksurveys.views/QuickSurvey.js#L87

Looking at the stats on eventLogging then when sendBeacon is present in the eventLogging, a few browsers stand out as having high levels of false values (so the EL was sent via a fake image request):

Browser FamilysendBeacon is TruesendBeacon is False
Chrome0.9980.002
Chrome Mobile0.9950.005
Mobile Safari0.9040.096
Firefox0.9930.007
Samsung Internet0.950.05
Safari0.7880.212
Edge0.990.01
IE0.0010.999
Mobile Safari UI/WKWebView0.9120.088
UC Browser0.9510.049
Opera0.9980.002
Chrome Mobile WebView0.9990.001
Chrome Mobile iOS0.6050.395
Firefox Mobile0.9940.006
Amazon Silk10
Opera Mobile0.9690.031

These browsers also largely map up w/ the ones that seemed underrepresented in the sampling (T218243#5086923) though it doesn't fully explain Safari and Firefox

Nuria added a subscriber: Nuria.Apr 15 2019, 6:16 PM

My other current theory is the missing 10% is possibly browsers that don't support sendBeacon

Seems unlikely rather makes sense that if you have a loading issue (per @phuedx ) comment above and that is causing events not being sent (cause EL module is not loaded) that issue will be more prevalent in older browsers that parse and load javascript much more slowly than new ones.
Are you showing surveys in mobile as well as desktop?

FYI that your table above does not take into account browser percentages, for example: the only IE browser you should see is IE11 as the older versions do not receive javascript and thus they cannot execute eventlogging code. "older" versions of IE indicate bots, not users. See: https://www.mediawiki.org/wiki/Compatibility#Modern_(Grade_A)

Isaac added a comment.Apr 15 2019, 8:39 PM

My other current theory is the missing 10% is possibly browsers that don't support sendBeacon

Seems unlikely rather makes sense that if you have a loading issue (per @phuedx ) comment above and that is causing events not being sent (cause EL module is not loaded) that issue will be more prevalent in older browsers that parse and load javascript much more slowly than new ones.

Yeah, that makes sense to me. Possibly both issues are at play (i.e. both slower parsing/loading of JS + having to rely on the less robust method of creating a fake image w/ the appropriate URL as opposed to the sendBeacon functionality). Regardless, both hypotheses suggest that the 10% of our survey responses that do not have associated EL are very likely responses submitted via older versions of IE or other, older browsers. Looking at the survey responses that are missing EL: they skew younger (below 40) and are more likely to be male than female but no other trends stand out.

Are you showing surveys in mobile as well as desktop?

Yes, both mobile web and desktop but not the app. The final proportion ends up being ~50% mobile and ~50% desktop.

FYI that your table above does not take into account browser percentages

Yeah, I left out versions because it was a lot of data. Same table but w/ browser versions and those w/ at least 1000 data points is below. A few takeaways:

BrowsersendBeacon is TruesendBeacons is False% of total
Chrome (v.72)1027.59%
Mobile Safari (v.12)0.9990.00118.42%
Chrome Mobile (v.72)1018.19%
Firefox (v.65)0.9990.0013.21%
Mobile Safari (v.11)0.7050.2952.55%
Samsung Internet (v.8)102.37%
IE (v.11)0.0010.9992.30%
Chrome Mobile (v.71)102.20%
Safari (v.12)102.19%
Chrome (v.71)102.09%
Edge (v.17)101.65%
Mobile Safari (v.10)010.89%
Chrome Mobile (v.70)100.78%
Mobile Safari UI/WKWebView (v.12)0.9670.0330.74%
Edge (v.18)100.63%
Chrome Mobile (v.68)100.61%
Chrome Mobile (v.69)100.60%
Chrome (v.70)100.54%
UC Browser (v.12)0.9940.0060.48%
Opera (v.58)100.46%
Samsung Internet (v.9)100.45%
Safari (v.11)0.4110.5890.44%
Mobile Safari (v.9)0.0110.9890.40%
Chrome Mobile (v.64)100.39%
Chrome (v.49)100.37%
Chrome Mobile (v.66)100.34%
Amazon Silk (v.72)100.33%
Chrome Mobile (v.67)100.29%
Chrome (v.67)100.29%
Chrome (v.69)100.28%
Chrome Mobile iOS (v.72)0.8730.1270.28%
Safari (v.10)010.28%
Samsung Internet (v.7)100.25%
Firefox (v.60)0.9980.0020.24%
Chrome (v.68)100.23%
Chrome Mobile (v.61)100.22%
Chrome Mobile (v.65)100.20%
Firefox Mobile (v.65)100.19%
Edge (v.16)100.18%
Firefox Mobile (v.48)100.18%
Chrome Mobile WebView (v.72)0.9990.0010.14%
Chrome Mobile (v.55)100.14%
Nuria added a comment.Apr 15 2019, 9:24 PM

Yes, both mobile web and desktop but not the app. The final proportion ends up being ~50% mobile and ~50% desktop.

Ok, makes sense that loading issues will be more prevalent on mobile connections on low end devices.

For these analyses, I fully skipped EventLogging and instead used webrequest logs, using a query like that below to gather the EL (and then attempting to join it to the survey responses provided by Qualtrics/Google Forms):

This query would give you also non valid events, for some schemas the number could be quite significant. Maybe you know this but just an FYI. See errors for the time period: https://logstash.wikimedia.org/app/kibana#/dashboard/default?_g=h@e902392&_a=h@324967b

Nuria lowered the priority of this task from Unbreak Now! to Needs Triage.Apr 19 2019, 1:32 PM
Nuria moved this task from Ops Week to Radar on the Analytics board.

Moving to radar as further steps of code chnages to Quicksurveys to fix loading issues with JS should be done by (i think) @phuedx team?

Moving to radar as further steps of code chnages to Quicksurveys to fix loading issues with JS should be done by (i think) @phuedx team?

👍 Readers Web maintain QuickSurveys and the EventLogging "backend".