Modify revision-score schema so that model probabilities won't conflict
Closed, ResolvedPublic13 Story Points

Description

When we designed the revision-score schema, we made the scores field be an array of objects. However, the objects here are too variable, as the individual model scores can create many different types of fields. In Hive, all nested fields of the same name will have their subfields merged together in one huge struct field. Example.

The following two events are not compatible

{"scores": [{"model": "good_or_bad", "prediction": {"bad": true}}]}

{"scores": [{"model": "badness_level", prediction": {"bad": 0.50 }}]}

In Hive, the bad field for both of these events is the same field (scores[0].prediction.bad), but it has different types.

To avoid this, can we change the schema so that scores is an object keyed by model name instead of an array? I'm not sure if we would need to key by model_version as well...does the schema of the score object change in any incompatible ways between versions?

I'll make a patch to consider...

There are a very large number of changes, so older changes are hidden. Show Older Changes
Ottomata moved this task from Next Up to In Progress on the Analytics-Kanban board.
Ottomata claimed this task.
fdans moved this task from Incoming to Data Quality on the Analytics board.Jun 14 2018, 4:24 PM
fdans triaged this task as High priority.

@Ladsgroup are there any compatibility constraints between model versions?

Hey, We don't have such constraint in place but it never happened to change anything that radical in model structure. It's safe to assume we are changing predication values and not the structure of a model in different versions.

Also, It's worth noting that it's a good idea to use model name as the key. I totally support it.

@Ladsgroup, cool, thanks. If the model structure does ever change in a significant way between model versions (e.g. type changes for fields), we will encounter failures.

@Pchelolo What do you think? We could either

  1. proceed as is. If we ever do need to distinguish between model version schemas, we could then choose to include the model versions in the scores object key names, e.g vandalism_123
  2. Change the schema so events look like this:
scores: {
  'vandalism': {
    '123': {
       probability: { ... }
    } 
  }
}

With the version as another nested level. This would be safer, but also uglier. I'm just worried that we will encounter something in the future that will force us into a corner where we make another backwards incompatible change.

Maybe Option 1. is fine?

Ooof I really don't like this variable key thing. Dunno what else we can do though. I'm def leaning towards Option 1.

Option 1 seems fine for me.

awight added a subscriber: awight.Jun 18 2018, 3:55 PM

If it's helpful, we solve this issue in the ORES API by responding with a separate block that gives information about the models:

https://ores.wikimedia.org/v3/scores/enwiki/12345678?models=damaging
{
  "enwiki": {
    "models": {
      "damaging": {
        "version": "0.4.0"
      }
    },
    "scores": {
      "12345678": {
        "damaging": {
          "score": {
            "prediction": false,
            "probability": {
              "false": 0.9006672855880219,
              "true": 0.09933271441197815
            }
          }
        }
      }
    }
  }
}

Interesting awight, does that mean that for any given revision, it is only every scored by a distinct model version?

@Ottomata No, our caching varies on model version, so when we deploy an updated model scores will be regenerated as if the cache had been purged for the previous model version.

Oh, so @Pchelolo, does that mean that every time a new model is deployed, new revision-score events will be emitted for previously scored revisions?

I don't think so, the events are based on revision-create and we don't emit any additional revision-create events.

Interesting, ok!

Hey hm I just brain bounced with @JAllemandou for a bit, and now I have some more thoughts and reservations about this patternProperties idea.

Even with patternProperties, these events are basically schemaless. We have an approximate schema, but there is nothing preventing the upstream models from doing crazy or backwards incompatible stuff. However, we do know that for a given model, the schema is relatively stable.

We could

A. serialize fields like prediction and probability into JSON strings.
This way, all revision-scores events would have the same type schema. Individual model results would be strings instead of possibly complex objects. This would suck for usability, as you'd have to code in extra JSON parsing just to read the e.g. scores[0].prediction field, but it would make the revision-score events schema compatible, and ease integration into Hive and other systems.

or

B. Emit an event for every revision & model score into model specific topics.
E.g. if revision X has is scored by models 'vandalism', 'quality', and 'interestingness' an event for each of those model scores would be emitted into the revision-score.vandalism, revision-score.quality and revision-score.interestingness Kafka topics. We could then provide specific schemas for each of the models. Each revision-score model would then get its own Hive table.

How many possible models are there right now? Would this be too annoying to deal with? (Technically, we wouldn't have to specific schemas for every model type, as we currently do Hive schema creation based on the json data fields, but schemas are WAY better and more future proof for integration with other systems.)

I'm not sure if it's in a format that might be helpful for this task, but we have a JSON-schema defined for JADE judgments, which are pretty close to the ORES model schemas:
https://phabricator.wikimedia.org/diffusion/EJAD/browse/master/jsonschema/judgment/v1.json

El El mar, 12 de junio de 2018 a la(s) 07:38, Ottomata <
no-reply@phabricator.wikimedia.org> escribió:

Ottomata added a comment.

@Ladsgroup https://phabricator.wikimedia.org/p/Ladsgroup/ are there any
compatibility constraints between model versions?

*TASK DETAIL*
https://phabricator.wikimedia.org/T197000

*EMAIL PREFERENCES*
https://phabricator.wikimedia.org/settings/panel/emailpreferences/

*To: *Ladsgroup, Ottomata

*Cc: *Aklapper, Ladsgroup, Pchelolo, gerritbot, Ottomata, Gaboe420,
Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Baloch007,
Vacio, Darkminds3113, Bsandipan, bkowshik, Lordiis, Adik2382, Th3d3v1ls,
Ramalepe, Liugev6, Avner, Lewizho99, Maathavan, Mkdw, srodlund, Eevans,
JAllemandou, mobrovac, Hardikj, GWicke, Alchimista, He7d3r, Rxy, jeremyb

@awight that is interesting. The judgement definitions seem to at least describe the possible models. I think I remember @Halfak was working on JSONSchema for the model scores themselves too? Is that available? If all of the models are well enough described, then I think it would make sense to either explicitly list all of the possible models in the revision-score schema scores object, OR to make new revision-score event schemas (e.g. revision-score.damaging) for each model type. @Pchelolo ?

Looks like we have some issues in the schema, but this is the access pattern we had in mind: https://ores.wikimedia.org/v3/scores/enwiki/?model_info=score_schema

Does something like that work?

This is not long at all! I will submit a patch and attempt to just include possible model scores into the event schema...

Oh, this is not easy. The schema's have a few incompatible field names as noted in T197000.

@Pchelolo just had an idea. We could transform the probability object into an array of key,value pairs. E.g.

model_name: damaging
probabilities: ["true": 0.9, "false": 0.1]

or

model_name: drafttopic
probabilities: ["Culture.Philosophy and religion", 0.5", ...]

As long as we are sure that every probabilities key, value pair is the same type, i.e. string, float, this should work fine.

Change 441875 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Disable public revision-score events until we figure out a good schema

https://gerrit.wikimedia.org/r/441875

Change 441875 merged by Ottomata:
[operations/puppet@production] Disable public revision-score events until we figure out a good schema

https://gerrit.wikimedia.org/r/441875

Vvjjkkii renamed this task from Modify revision-score schema so that model probabilities won't conflict to 36aaaaaaaa.Jul 1 2018, 1:04 AM
Vvjjkkii removed Ottomata as the assignee of this task.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed subscribers: Aklapper, gerritbot.
Community_Tech_bot assigned this task to Ottomata.
Community_Tech_bot renamed this task from 36aaaaaaaa to Modify revision-score schema so that model probabilities won't conflict.

After a quick h-o with @Ottomata and @JAllemandou we've understood that the /precache endpoint used to produce these is kinda private, used only by change-prop, so we're free to change the format it emits in whatever way we want.

Halfak added a comment.Aug 8 2018, 8:25 PM

That's roughly right. Precache will always produce ORES native format that has been designed for JSON tool developers.

If we'd like to develop something to modify this output, it seems that it should live outside of ORES and rely on our versioned interfaces. E.g. the precache endpoint lives within the /v3/ API spec at the moment. If we ever move to a v4 format, we plan to continue to support the v3 format as we have for v1 and v2.

Sooooo, could we make a /v4/precache endpoint that does this?

Ok, I revoke the idea of having this in ores - @Halfak said that within an API version regardless of the model version they maintain backward compatibility for output formats - so if the model version change, the client written for the previous model version won't break. This makes it easier to create a robust reformatter and put it into change-prop

And also, looking into native ORES response, like https://ores.wikimedia.org/v3/scores/enwiki/854077897 the only issue I see is that the scores are prefixed with rev_id. Remove rev_id wrapping and I don't see any obvious conflicts.

Halfak added a comment.Aug 8 2018, 9:28 PM

FWIW, I am interested in supporting a super-basic format, but I don't want to call it v4 because it will be a downgrade from v3 if we remove rev_ids. That'd be confusing to new developers who are checking out ORES. Maybe v0. Or vSimple.

Ottomata set the point value for this task to 13.Aug 9 2018, 3:02 PM

Could/should we just add another endpoint? /v3/scores-normalized? or even a parameter e.g. /v3/scores?normalized=true?

After a quick h-o with @Ottomata and @JAllemandou we've understood that the /precache endpoint used to produce these is kinda private, used only by change-prop, so we're free to change the format it emits in whatever way we want.

If ^ is true, then why not just modify the end point directly? Or am I missing something?

Halfak added a comment.Aug 9 2018, 3:18 PM

Yes. As you might imagine, we strive for consistency both to keep our engineering simple and to not surprise our users.

We have one output formatter for each version of the API. Y'all are just one consumer of the API. It makes way more sense for you to consume our standard, versioned output and convert it to whatever format you like than for us to start implementing re-formattings for every consumer of ORES. After all, y'all are working with a stream processing system. Doesn't it make sense to add a reformatting node to your own system to normalize things in the way that you like? E.g. something like a hive-writer or an ores-precache-reader?

From our point of view, you're asking for us to implement and maintain confusing inconsistencies inside of ORES to support your specific use-case. Having one endpoint have non-standard schema is inconsistent. Adding a new endpoint with non-standard schema is inconsistent. Adding a new version where the standard schema is simplified (normalized is not the right term) is consistent. Calling it v4 would be confusing. In this case, we would be "simplifying" out a key piece of information that is necessary for batch requests (rev_id), so we'd need to remove functionality for that version of the API. "v4" would give the impression that this simplified version is the new, best version of the API. So I'd rather call it something else.

Halfak added a comment.Aug 9 2018, 3:25 PM

Also, I feel like it is important to note that ORES is not a MediaWiki-specific technology. We've been working on OSM and other types of score-able content for a while. We haven't prioritized the modeling work yet, but we will eventually. We've engineered the whole system to be flexible in this regard and we've already started explorations into further separation of concerns. Engineering Wikimedia/MediaWiki-specific needs into ORES is a bit backwards and out of scope for this particular system.

From our point of view, you're asking for us to implement and maintain confusing inconsistencies inside of ORES to support your specific use-case. Having one endpoint have non-standard schema is inconsistent. Adding a new endpoint with non-standard schema is inconsistent. Adding a new version where the standard schema is simplified (normalized is not the right term) is consistent. Calling it v4 would be confusing. In this case, we would be "simplifying" out a key piece of information that is necessary for batch requests (rev_id), so we'd need to remove functionality for that version of the API. "v4" would give the impression that this simplified version is the new, best version of the API. So I'd rather call it something else.

But the API is supposed to serve clients, right? So the client's point of view should be taken into account. Regardless, I can see this being not the only need/request coming for different things/formatting/etc. Wouldn't having internal and external APIs (that serve their respective clients) a solution? Or even, perhaps, a different output formatter? One that could be perhaps specified by the client?

Engineering Wikimedia/MediaWiki-specific needs into ORES is a bit backwards and out of scope for this particular system.

I don't think that's a fair assessment. It's awesome that ORES is being used by other entities and for different purposes, nut brushing off a specific WM need for a tool developed for and by WM looks a bit dismissive to me. to be clear, I'm not criticising, I'm just saying that IMHO this comment is out of place and OT for this ticket.

Let's get back on track.

  1. We've discussed the solution of the reformatter in ORES with @Halfak and we decided it's not a good idea.
  2. I've asked a question to Analytics whether the current schema of ORES will be good for their use-case if we remove the rev-id from the keys.
  3. @Halfak ensured that model schemas will be backward compatible within the same API version.

I've asked a question to Analytics whether the current schema of ORES will be good for their use-case if we remove the rev-id from the keys.

@Pchelolo, @joal and I discussed a bunch today a desired schema here. While we'd love for the response schema to be 100% predictable and consistent, it seems very hard to do this with JSON and JSONSchema, mainly because JSON/JSONSchema doesn't have the concept of a Map type. So, our options are to keep scores an array, and use key/value pairs for the probabilities, like

"scores": [
    {
        "model_name": "drafttopic",
        "model_version": "0.4.0",
        "prediction": "History_And_Society.Transportation"
        "probabilities": [
          { name: "Assistance.Article improvement and grading", value: 0.9006672855880219 },
          { name: "History_And_Society.Transportation", value: 0.5 }
        ]
    }
...
]

Or, to stick close to the ORES response and just clean it up a little bit, keeping the variable keys like model and probability names in the object, like

"scores": {
    "drafttopic": {
        "model_name": "drafttopic",
        "model_version": "0.4.0",
        "prediction": "History_And_Society.Transportation",
        "probability": {
          "Assistance.Article improvement and grading": 0.9006672855880219,
          "History_And_Society.Transportation": 0.5 
        }
    },
...
}

We can clean the probability field names of (most) nasty characters when importing in to Hive. This will result in a huge nested table schema, but at least there won't be field type conflicts in the probability struct since we are keying by the model name. However, I believe this struct structure will be easier to query with SQL than with arrays.

A couple of notes:

  1. For the drafttopic model, "prediction" will be a list of strings
  2. The set of class names will change from time to time as Wikipedians modify their directory structure. Same names will always mean the same (or same enough) thing. New names might be synonyms of old names.

New names won't hurt. Yargh, list of strings will. I had thought that predicition was always the same type. @JAllemandou, will do you know if Refine will cast the list of strings to a string somehow?

Actually, it won't totally hurt since we are now key-ing by model name. As long as any given model always uses the same types.

+1 for same types

Just had a meeting with Aaron and Petr. We talked about how there is a Mediawiki JobQueue based job that also (might?) have the ORES score available during revision create. If so, we would prefer to do whatever transformation is necessary there, rather than in change-prop or in ORES itself. TBD... :)

Halfak changed the status of subtask T197828: Fix "score_schema" -- invalid JSON Schema from Duplicate to Resolved.Aug 16 2018, 10:21 AM
Ottomata added a comment.EditedOct 4 2018, 6:30 PM

Ok got another schema draft up for review. @JAllemandou convinced me to simplify the schema. https://gerrit.wikimedia.org/r/#/c/mediawiki/event-schemas/+/439917/9

@Pchelolo, this means that change prop will have to convert the incoming score event to this format. Whatcha think?

@Pchelolo merging https://gerrit.wikimedia.org/r/#/c/mediawiki/event-schemas/+/439917/9 will require a coordinated effort with a change-prop deploy. Can you prep the change-prop patch?

@Ottomata I think there should be 2 steps here

  • first we just stop emitting events completely
  • second we deploy this
  • third we re-enable even emitting in the new format

Since it's a deploy freeze this week, I will prep both patches by Monday.

Change 466677 had a related patch set uploaded (by Ppchelko; owner: Ppchelko):
[mediawiki/services/change-propagation/deploy@master] TEMP: Stop emitting revision-score event for schema change.

https://gerrit.wikimedia.org/r/466677

+1 plan sounds good to me.

We plan to deploy this Monday Nov 26th.

Change 466677 merged by Ppchelko:
[mediawiki/services/change-propagation/deploy@master] TEMP: Stop emitting revision-score event for schema change.

https://gerrit.wikimedia.org/r/466677

Mentioned in SAL (#wikimedia-operations) [2018-11-26T15:26:31Z] <ppchelko@deploy1001> Started deploy [changeprop/deploy@b97e8eb]: Temporary stop emitting revision-score events for schema change T197000

Mentioned in SAL (#wikimedia-operations) [2018-11-26T15:27:51Z] <ppchelko@deploy1001> Finished deploy [changeprop/deploy@b97e8eb]: Temporary stop emitting revision-score events for schema change T197000 (duration: 01m 21s)

Mentioned in SAL (#wikimedia-operations) [2018-11-26T15:48:38Z] <ppchelko@deploy1001> Started deploy [changeprop/deploy@77be2c6]: Change schema for revision-score events and start emitting again T197000

Mentioned in SAL (#wikimedia-operations) [2018-11-26T15:50:02Z] <ppchelko@deploy1001> Finished deploy [changeprop/deploy@77be2c6]: Change schema for revision-score events and start emitting again T197000 (duration: 01m 24s)

Change 475778 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Refine revision-score after schema improvements

https://gerrit.wikimedia.org/r/475778

Change 475781 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[mediawiki/event-schemas@master] Make predicition and probability not required

https://gerrit.wikimedia.org/r/475781

Change 475781 merged by Ottomata:
[mediawiki/event-schemas@master] Make predicition and probability not required

https://gerrit.wikimedia.org/r/475781

Change 475778 merged by Ottomata:
[operations/puppet@production] Refine revision-score after schema improvements

https://gerrit.wikimedia.org/r/475778

Change 475783 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[mediawiki/event-schemas@master] Use error.type instead of error.code in revision/score

https://gerrit.wikimedia.org/r/475783

Change 475783 merged by jenkins-bot:
[mediawiki/event-schemas@master] Use error.type instead of error.code in revision/score

https://gerrit.wikimedia.org/r/475783

Woot, we have a Hive event.mediawiki_revision_score table! :)

Mentioned in SAL (#wikimedia-operations) [2018-11-26T18:44:12Z] <ppchelko@deploy1001> Started deploy [changeprop/deploy@c89bff5]: Prepared for ORES error response T197000

Mentioned in SAL (#wikimedia-operations) [2018-11-26T18:45:25Z] <ppchelko@deploy1001> Finished deploy [changeprop/deploy@c89bff5]: Prepared for ORES error response T197000 (duration: 01m 13s)

Pchelolo closed this task as Resolved.Mon, Nov 26, 6:47 PM

Ok. The events with the new schema are being emitted, there are no rejections on the EventBus service side, no errors in ChangeProp since the last patch was deployed. I believe we can finally resolve this task 🎉