Page MenuHomePhabricator

Restructure ORES response format to make room for metadata
Closed, ResolvedPublic

Description

See https://github.com/wiki-ai/ores/issues/24

We need to be able to include various bits of metadata about the scoring model when returning scores.

Related Objects

StatusSubtypeAssignedTask
ResolvedNone
ResolvedLadsgroup

Event Timeline

Halfak raised the priority of this task from to Needs Triage.
Halfak updated the task description. (Show Details)
Halfak moved this task to Parked on the Machine-Learning-Team (Active Tasks) board.
Halfak subscribed.

I like the "meta" version that Helder proposes, but we'd need to also be able to include the info in multi-model mode. Here's what I imagine:

/scores/enwiki?models=damaging|reverted&revids=123456788|123456789&model_info=version|trained|stats
{
  "model_info": {
    "damaging": {
      "version": "0.0.4",
      "trained": 1234567890.00,
      "stats": {
         "auc": 0.8937382,
         "accuracy": 0.8892833
      }
    },
    "goodfaith": {
      "version": "0.0.4",
      "trained": 1234567890.00,
      "stats": {
        "auc": 0.9201838,
        "accuracy": 0.756293
      }
    }
  },
  "scores": {
    "123456789": {
      "damaging": {
        "prediction": false, ...
     },
     "goodfaith": {
       "prediction": true, ...
     }
    },
    "123456788": {
      "damaging": {
        "prediction": true, ...
     },
     "goodfaith": {
       "prediction": true, ...
     }
    }
  }
}

I think that it would then make sense to include the "scores" sub-field in a simple request. E.g.:

/scores/enwiki/damaging/123456788
{
  "scores": {
    "123456788": {
      "damaging": {
        "prediction": true, ...
     },
    }
  }
}

One thing I am struggling with is changing the output format for ORES. If we do this, I'd like to do it once since we'll ask our users to update their code when we deploy. It would also be nice if we could have a transition period. I imagine implement a function that formats a scoring results based on a version number. E.g. ?format=v2 and that the default would stay "v1" for some transition period. This function would look like this:

def format_output(scorer_models, scores, model_info, version="v1"):
    """
    Formats a JSON blob of scores.

    :Parameters:
        scorer_models : `set`
            A set of scorer models used in generating the scores.
        scores : `dict`
            A JSONable dictionary of scores by revision ID and model.
        model_info : `set`
            A set of model information keys that should be includes (e.g. "version", 
            "trained" and/or "stats")
        version : `str`
           An output format version to apply.  "v1" excludes any `model_info`. 
           "v2" returns a more complex JSON document format that can 
           contain `model_info`
    """
    ...

It would return a JSON blob ready for output. If version is set to "v1", the response format could not contain the meta information. If the version is set to "v2", the response format will look like the one above.

I'd be nice for developer usability and cacheability to have the version parameter be part of the REST url for the common case of accessing things like /scores/enwiki/damaging/123456788. E.g. as /v2/scores/enwiki/damaging/123456788, or elsewhere in the url.

What benefits does putting the output format in the path give? It seems that what you are imagining is a whole-ly version API where paths might change. We're not planning to change the path here, but this sounds like a good strategy for enabling path changes in the future.

@Ladsgroup and I worked this out: https://etherpad.wikimedia.org/p/ores_response_structure

It seems that we like the "Deeper merged trees" proposal. I'll copy-paste here:

  • "warnings" []
    • "type"
      • "Something bad happened!"
    • "message"
      • "This is my message
  • "notice" []
    • "type"
      • "model_deployment_scheduled"
    • "message"
      • "2016-02-17T12:34:56Z"
  • "scores"
    • <context>
      • <model>
        • "version"
        • "scores"
          • <rev_id>
            • {score|error}
        • "info"
          • {model_info}

A scoring request:

/scores/<context>/<model>/<rev_id>/
{"scores": {"<context>": {"<model>": {"version": "<version>", "scores": {"<rev_id>":  {... score or error ...}}}}}}

A model request:

/scores/<context>/<model>/
{"scores":{"<context>":{"<model>": {... model_info ...}}}}
He7d3r closed this task as Resolved.EditedMar 15 2016, 10:26 PM
He7d3r subscribed.

Is it resolved then?

Yup. I usually do 'em in batch, but please feel free to resolve once tasks make it to the (Done) column on the #revision-scoring-as-a-service board if you beat me to it. :)