Page MenuHomePhabricator

Restructure ORES response format to make room for metadata
Closed, ResolvedPublic

Description

See https://github.com/wiki-ai/ores/issues/24

We need to be able to include various bits of metadata about the scoring model when returning scores.

Related Objects

Event Timeline

Halfak created this task.Dec 10 2015, 1:23 AM
Halfak raised the priority of this task from to Needs Triage.
Halfak updated the task description. (Show Details)
Halfak moved this task to Active on the Scoring-platform-team (Current) board.
Halfak added a subscriber: Halfak.
Restricted Application added subscribers: StudiesWorld, Aklapper. · View Herald TranscriptDec 10 2015, 1:23 AM

I like the "meta" version that Helder proposes, but we'd need to also be able to include the info in multi-model mode. Here's what I imagine:

/scores/enwiki?models=damaging|reverted&revids=123456788|123456789&model_info=version|trained|stats
{
  "model_info": {
    "damaging": {
      "version": "0.0.4",
      "trained": 1234567890.00,
      "stats": {
         "auc": 0.8937382,
         "accuracy": 0.8892833
      }
    },
    "goodfaith": {
      "version": "0.0.4",
      "trained": 1234567890.00,
      "stats": {
        "auc": 0.9201838,
        "accuracy": 0.756293
      }
    }
  },
  "scores": {
    "123456789": {
      "damaging": {
        "prediction": false, ...
     },
     "goodfaith": {
       "prediction": true, ...
     }
    },
    "123456788": {
      "damaging": {
        "prediction": true, ...
     },
     "goodfaith": {
       "prediction": true, ...
     }
    }
  }
}

I think that it would then make sense to include the "scores" sub-field in a simple request. E.g.:

/scores/enwiki/damaging/123456788
{
  "scores": {
    "123456788": {
      "damaging": {
        "prediction": true, ...
     },
    }
  }
}

One thing I am struggling with is changing the output format for ORES. If we do this, I'd like to do it once since we'll ask our users to update their code when we deploy. It would also be nice if we could have a transition period. I imagine implement a function that formats a scoring results based on a version number. E.g. ?format=v2 and that the default would stay "v1" for some transition period. This function would look like this:

def format_output(scorer_models, scores, model_info, version="v1"):
    """
    Formats a JSON blob of scores.

    :Parameters:
        scorer_models : `set`
            A set of scorer models used in generating the scores.
        scores : `dict`
            A JSONable dictionary of scores by revision ID and model.
        model_info : `set`
            A set of model information keys that should be includes (e.g. "version", 
            "trained" and/or "stats")
        version : `str`
           An output format version to apply.  "v1" excludes any `model_info`. 
           "v2" returns a more complex JSON document format that can 
           contain `model_info`
    """
    ...

It would return a JSON blob ready for output. If version is set to "v1", the response format could not contain the meta information. If the version is set to "v2", the response format will look like the one above.

Halfak set Security to None.

I'd be nice for developer usability and cacheability to have the version parameter be part of the REST url for the common case of accessing things like /scores/enwiki/damaging/123456788. E.g. as /v2/scores/enwiki/damaging/123456788, or elsewhere in the url.

What benefits does putting the output format in the path give? It seems that what you are imagining is a whole-ly version API where paths might change. We're not planning to change the path here, but this sounds like a good strategy for enabling path changes in the future.

@Ladsgroup and I worked this out: https://etherpad.wikimedia.org/p/ores_response_structure

It seems that we like the "Deeper merged trees" proposal. I'll copy-paste here:

  • "warnings" []
    • "type"
      • "Something bad happened!"
    • "message"
      • "This is my message
  • "notice" []
    • "type"
      • "model_deployment_scheduled"
    • "message"
      • "2016-02-17T12:34:56Z"
  • "scores"
    • <context>
      • <model>
        • "version"
        • "scores"
          • <rev_id>
            • {score|error}
        • "info"
          • {model_info}

A scoring request:

/scores/<context>/<model>/<rev_id>/
{"scores": {"<context>": {"<model>": {"version": "<version>", "scores": {"<rev_id>":  {... score or error ...}}}}}}

A model request:

/scores/<context>/<model>/
{"scores":{"<context>":{"<model>": {... model_info ...}}}}
Halfak assigned this task to Ladsgroup.Mar 15 2016, 8:38 PM
Halfak moved this task from Active to Done on the Scoring-platform-team (Current) board.
He7d3r closed this task as Resolved.EditedMar 15 2016, 10:26 PM
He7d3r added a subscriber: He7d3r.

Is it resolved then?

Yup. I usually do 'em in batch, but please feel free to resolve once tasks make it to the (Done) column on the Scoring-platform-team board if you beat me to it. :)