Page MenuHomePhabricator

Fix CompletionSuggestion data collection and re-start the test.
Closed, ResolvedPublic

Description

The data collected by the completion suggester experiement doesn't make sense and is all over the place. Starting on Sept 10 the TestSearchSatisfaction2 and CompletionSuggestion experiments started seeing what should be unique 64 bit numbers coming from multiple ip addresses. Figure out why this is happening and fix it.

Event Timeline

EBernhardson claimed this task.
EBernhardson raised the priority of this task from to Needs Triage.
EBernhardson updated the task description. (Show Details)
EBernhardson subscribed.
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Change 238306 had a related patch set uploaded (by EBernhardson):
Update CompletionSuggestion bucket selection

https://gerrit.wikimedia.org/r/238306

Change 238306 merged by jenkins-bot:
Update CompletionSuggestion bucket selection

https://gerrit.wikimedia.org/r/238306

Change 238355 had a related patch set uploaded (by EBernhardson):
Update CompletionSuggestion bucket selection

https://gerrit.wikimedia.org/r/238355

Change 238355 merged by jenkins-bot:
Update CompletionSuggestion bucket selection

https://gerrit.wikimedia.org/r/238355

patch swatted out. Will evaluate the data collected tomorrow morning to decide if this fixes the problem.

Deskana set Security to None.
Deskana added subscribers: mpopov, Ironholds.
Deskana subscribed.

Looks like dan found the issue today, reported at https://lists.wikimedia.org/pipermail/analytics/2015-September/004285.html

So this basically means we need to throw away the clientIp information until this can be fixed.

Can we get useful test data without this value? We can still correlate together events by the same user on the same page, we just can't correlate them together across pages (but chances are they won't be opted into the test more than once).

Are there other oddities in the data we can't explain?

Update: we discussed this on IRC and arrived at the conclusion that we can assume relative independence of sets of events. Which is to say, given our low sampling rates, we are not likely to see logs of sessions from the same users.

Update: we discussed this on IRC and arrived at the conclusion that we can assume relative independence of sets of events. Which is to say, given our low sampling rates, we are not likely to see logs of sessions from the same users.

Does this mean the answer to the question "Is this test still scientifically valid, can be analysed as-is, and does not need to be re-run?" is "Yes"?

To be valid I think we have to start the test over as of when the adjusted schema was deployed today. There were a few changes made to bucketing (that will also help on other tests going forward) so the data moving forward isn't directly comparable with the data prior. Maybe? Not entirely confident but putting it out there

To be valid I think we have to start the test over as of when the adjusted schema was deployed today. There were a few changes made to bucketing (that will also help on other tests going forward) so the data moving forward isn't directly comparable with the data prior. Maybe? Not entirely confident but putting it out there

Understood. We should get the test restarted ASAP. How about restarting the test on Thursday 17th, running for one week? @mpopov @Ironholds @EBernhardson Thoughts?

I think we can consider the test restarted the moment the new schema started collecting data

I think we can consider the test restarted the moment the new schema started collecting data

Sounds fine to me. Okay with @mpopov and @Ironholds?

Sounds fine to me. Okay with @mpopov and @Ironholds?

Confirmed over hangouts with @Ironholds that this is okay. Resolving accordingly.