Page MenuHomePhabricator

Create connector porting the A/B test Cirrus logfiles to stat1002 from Fluorine
Closed, ResolvedPublic

Description

We decided that for the A/B tests we'd have a new log file that contains those events. Great success! This log file will need transferring from fluorine (where logfiles end up) to stat1002 (where data processing is done) so that we can analyse the data.

These will be generated at

fluorine.eqiad.wmnet: /a/mw-log/CirrusSearchUserTesting.log

and will move into /a/mw-log/archive daily just like CirrusSearchRequests.

This file is formatted with one json blob per line. Sample data:

{
  "userAgent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.125 Safari/537.36",
  "xff": false,
  "ip": "10.0.2.2",
  "source": "web",
  "hits": 1,
  "queries": [
    {
      "suggest": "main page",
      "query": "mani page",
      "queryType": "full_text"
    },
    {
      "suggest": "",
      "query": "main page",
      "queryType": "full_text"
    }
  ],
  "tests": {
    "someTestName": "myBucket"
  },
  "wiki": "wiki"
}

The data is for all requests participating in the test and does not contain non-participants (a bucket will be defined in the testing config for the control group). The test is not limited to zero result queries, all suggestions are being modified.

Event Timeline

Ironholds raised the priority of this task from to Medium.
Ironholds updated the task description. (Show Details)
Ironholds added a project: Discovery-ARCHIVED.
Ironholds subscribed.
EBernhardson set Security to None.
EBernhardson updated the task description. (Show Details)

@Ironholds @EBernhardson Was this done? Given that it apparently blocks us from running the A/B test that is currently running, I am assuming so? Or was a different solution decided on in the end?

I spoke to Erik this morning and he said he was waiting on Otto; to my knowledge this hasn't been done yet.

@Ironholds Understood. As it stands this is sitting in the backlog, and not in the sprint; does this need to be prioritised higher?

Absolutely, particularly as we move to a lower sampling rate and will thus have to rely on wider data ranges.

@Ironholds Done. Whoever ends up working on this can ping you when they start if they need further details. Thanks!