See https://github.com/wiki-ai/ores/issues/24
We need to be able to include various bits of metadata about the scoring model when returning scores.
See https://github.com/wiki-ai/ores/issues/24
We need to be able to include various bits of metadata about the scoring model when returning scores.
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | None | T122271 [Epic] ORES 1.0.0 | |||
Resolved | Ladsgroup | T121054 Restructure ORES response format to make room for metadata |
I like the "meta" version that Helder proposes, but we'd need to also be able to include the info in multi-model mode. Here's what I imagine:
/scores/enwiki?models=damaging|reverted&revids=123456788|123456789&model_info=version|trained|stats
{ "model_info": { "damaging": { "version": "0.0.4", "trained": 1234567890.00, "stats": { "auc": 0.8937382, "accuracy": 0.8892833 } }, "goodfaith": { "version": "0.0.4", "trained": 1234567890.00, "stats": { "auc": 0.9201838, "accuracy": 0.756293 } } }, "scores": { "123456789": { "damaging": { "prediction": false, ... }, "goodfaith": { "prediction": true, ... } }, "123456788": { "damaging": { "prediction": true, ... }, "goodfaith": { "prediction": true, ... } } } }
I think that it would then make sense to include the "scores" sub-field in a simple request. E.g.:
/scores/enwiki/damaging/123456788
{ "scores": { "123456788": { "damaging": { "prediction": true, ... }, } } }
One thing I am struggling with is changing the output format for ORES. If we do this, I'd like to do it once since we'll ask our users to update their code when we deploy. It would also be nice if we could have a transition period. I imagine implement a function that formats a scoring results based on a version number. E.g. ?format=v2 and that the default would stay "v1" for some transition period. This function would look like this:
def format_output(scorer_models, scores, model_info, version="v1"): """ Formats a JSON blob of scores. :Parameters: scorer_models : `set` A set of scorer models used in generating the scores. scores : `dict` A JSONable dictionary of scores by revision ID and model. model_info : `set` A set of model information keys that should be includes (e.g. "version", "trained" and/or "stats") version : `str` An output format version to apply. "v1" excludes any `model_info`. "v2" returns a more complex JSON document format that can contain `model_info` """ ...
It would return a JSON blob ready for output. If version is set to "v1", the response format could not contain the meta information. If the version is set to "v2", the response format will look like the one above.
I'd be nice for developer usability and cacheability to have the version parameter be part of the REST url for the common case of accessing things like /scores/enwiki/damaging/123456788. E.g. as /v2/scores/enwiki/damaging/123456788, or elsewhere in the url.
What benefits does putting the output format in the path give? It seems that what you are imagining is a whole-ly version API where paths might change. We're not planning to change the path here, but this sounds like a good strategy for enabling path changes in the future.
@Ladsgroup and I worked this out: https://etherpad.wikimedia.org/p/ores_response_structure
It seems that we like the "Deeper merged trees" proposal. I'll copy-paste here:
A scoring request:
/scores/<context>/<model>/<rev_id>/
{"scores": {"<context>": {"<model>": {"version": "<version>", "scores": {"<rev_id>": {... score or error ...}}}}}}
A model request:
/scores/<context>/<model>/
{"scores":{"<context>":{"<model>": {... model_info ...}}}}
Yup. I usually do 'em in batch, but please feel free to resolve once tasks make it to the (Done) column on the #revision-scoring-as-a-service board if you beat me to it. :)