Target sprint: sprint 79.
For background, read the following:
{T139319}
{T140485}
This spike is a request to examine the Hovecards and general instrumentation code to identify if there are things in the code that could help explain, at least in part, seemingly unexpected data outcomes.
# In T139319#2475143, T139319#2481986, and T139319#2507540 it is noted that the concentration of clicks for Hovecards OFF (the control group) occurs with a lower interaction time than the concentration for Hovercards ON (the test group), and furthermore that the sheer percentage of early clicks is strikingly higher for the control group than the test group. @tbayer plans to do some depuplication of events to see how that might influence the data outcomes, but the question still remains: is there something in the code that would cause this? Is the curve merely shifted due to a race condition? Are events potentially being dropped anywhere? Is there something else at play?
# As noted in T139319#2507559 duplicate events are being observed in the database. The specific observation here is about //click// events being more likely to be duplicated for the test groups' outlier usage scenaries, but more generally duplicates have been observed elsewhere. Is there anything that would explain these duplicate events? Is it possible to observe duplicated events at the client? Is there any way to guard against duplicate events? In which case(s) are duplicate events more likely to occur?
# Above 250ms as well, there appear to be fewer link interactions in general for the Hovercards ON (test group) case as opposed to the Hovercards OFF (control group). Is there a potential code reason why this might be the case? (n.b., this may be a behavioral difference attributable to the way users mouse around links, but that's a separate conversation.)
Several behavioral explanations have been proffered, but this spike is about looking at the client side instrumentation for general code correctness, potential race conditions, duplicate eventing, and so forth. It's okay to describe potential user behaviors and how events may manifest - indeed, that's necessary! - but theories about user learning and behavioral changes have been explored pretty well at this point.
This spike should result in diagnosis of the outcomes pertaining to the questions, and where issues can be rectified, tasks opened for fixes. As fixes, if they're required, guarantee better data quality outcomes in the second A/B test, they should be scheduled for the very next sprint and if time allows in //this// task's sprint, pulled into this sprint; maybe it will be so easy it can be done right on the spot within the spike time allotment - that would be ideal.
Browsers potentially for exploration:
Browsers potenttially for exploration if trying to induce eventing. LTR/RTL may be worth exploration, but probably isn't a major component. Traffic shaping the connection with a router, in-browser settings, on-device filtering, or a utility like Link Conditioner can be handy, too.
[] Desktop Firefox
[] Desktop Chrome
[] Desktop Safari
[] Internet Explorer 11
[] Edge