Page MenuHomePhabricator

Support (or not) the ORES augmented feature output in liftwing
Closed, ResolvedPublic

Description

Do we want to support something like https://ores.wikimedia.org/v3/scores/frwiki/186357639/damaging?features in liftwing? :)

Event Timeline

After a chat with Andy, it seems that an easy way to go could be to add another input parameter that states whether or not the full features output is needed.

It should be trivial to add in all models except articlequality, where some more thinking is needed.

The feature names are stored in the model instance and then we get the feature values back from the extractor, so we will need to combine them together before including them in the response, something like:

feat = dict(zip(self.model.features, feature_values))

I'd be in favor to add this functionality, afaics from https://github.com/wikimedia/ores/commit/efe0b3111d5dc127601221934c7dd27ae371a266 it should be easy as Andy mentioned.

Let's prioritize the new feature next Monday :)

@achou this could be a good task to work on if you like it! Should be a Python change to the code that supports the models on Lift Wing, plus a deployment (so you can see the whole pipeline).

Hi,

I did some research on how ORES does this, in ores/scoring_context.py, IIUC:

  1. process_model_scores() has a include_features parameter
  2. if the param is True, will call _solve_base_feature_map(), it returns a dict of feature names and feature values
  3. feature names are stored in the model instance, but they are trimmed using trim() in revscoring/features/functions.py to get the name of base features
  4. feature values can be easily got from extractor.solve() and we're already using them for scoring.

I have a question on 3, do we need the same function to trim feature names? In the description of the function, it says it

Trims a feature set down to a bare set of :class:~revscoring.Feature by removing :class:~revscoring.features.Modifier and :class:~revscoring.features.Constant.

@kevinbazira
@ACraze do you have any thoughts on this? Thanks!!

@achou I think that we can start with doing the same and using that trim function, IIUC we use revscoring extensively in our model.py code so trim() should be available (not sure if this is what you were asking, in case not let's follow up :)

@elukey yes, that is what I was asking. Thanks for pointing out trim() is available in model.py :)

Change 767494 had a related patch set uploaded (by AikoChou; author: AikoChou):

[machinelearning/liftwing/inference-services@main] Add the ORES augmented feature output

https://gerrit.wikimedia.org/r/767494

Change 767494 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] editquality: add the ORES augmented feature output

https://gerrit.wikimedia.org/r/767494

Change 770886 had a related patch set uploaded (by AikoChou; author: AikoChou):

[operations/deployment-charts@master] ml-services: update arwiki editquality predictor image

https://gerrit.wikimedia.org/r/770886

Change 770886 merged by jenkins-bot:

[operations/deployment-charts@master] ml-services: update arwiki editquality predictor image

https://gerrit.wikimedia.org/r/770886

Current status:

  • we deployed the new editquality image for arwiki, and tested the feature.

Next steps:

  • apply the same change to the other editquality models (trivial change, low priority)
  • add augmented output to draftquality and articlequality

Change 778225 had a related patch set uploaded (by AikoChou; author: AikoChou):

[machinelearning/liftwing/inference-services@main] draftquality: add the ORES augmented feature output

https://gerrit.wikimedia.org/r/778225

Change 778248 had a related patch set uploaded (by AikoChou; author: AikoChou):

[machinelearning/liftwing/inference-services@main] articlequality: add the ORES augmented feature output

https://gerrit.wikimedia.org/r/778248

Change 778250 had a related patch set uploaded (by AikoChou; author: AikoChou):

[machinelearning/liftwing/inference-services@main] topic: add the ORES augmented feature output

https://gerrit.wikimedia.org/r/778250

The augmented feature output in liftwing seems to be inconsistent with the output in ORES for articlequality and editquality models. However, topic models don't have this problem. That's odd because we implemented it in the same way for all types of models. The reason is still under investigation. Following are some examples:

articlequality

{
   "predictions":{
      "prediction":"FA",
      "probability":{
         "B":0.029863312663358413,
         "C":0.012763877975571642,
         "FA":0.7928488682051067,
         "GA":0.15765237592267076,
         "Start":0.0048941204523967885,
         "Stub":0.0019774447808955787
      }
   },
   "features":{
      "feature.wikitext.revision.chars":65829.0,
      "feature.wikitext.revision.content_chars":33914.0,
      "feature.wikitext.revision.ref_tags":236.0,
      "feature.wikitext.revision.wikilinks":0.0069587780857462995,
      "feature.wikitext.revision.external_links":300.0,
      "feature.wikitext.revision.headings_by_level(2)":0.008845904346287669,
      "feature.wikitext.revision.headings_by_level(3)":37.0,
      "feature.wikitext.revision.list_items":0.0010909948693754792,
      "feature.enwiki.revision.image_links":12.0,
      "feature.enwiki.revision.image_template":0.00035383617385150676,
      "feature.enwiki.revision.images_in_templates":5.0,
      "feature.enwiki.revision.images_in_tags":0.0001474317391047945,
      "feature.enwiki.infobox_images":7.0,
      "feature.enwiki.revision.category_links":0.00020640443474671227,
      "feature.enwiki.revision.shortened_footnote_templates":12.0,
      "feature.enwiki.revision.cite_templates":0.00035383617385150676,
      "feature.wikitext.revision.templates":9.0,
      "feature.enwiki.revision.infobox_templates":0.0002653771303886301,
      "feature.enwiki.revision.cn_templates":236.0,
      "feature.enwiki.revision.who_templates":0.0069587780857462995,
      "feature.enwiki.main_article_templates":63.0,
      "feature.english.stemmed.revision.stems_length":0.0018576399127204104,
      "feature.enwiki.revision.paragraphs_without_refs_total_length":0.2669491525423729,
      "feature.len(<datasource.english.words_to_watch.revision.matches>)":173.0,
      "feature.len(<datasource.wikitext.revision.words>)":0.005101138173025889,
      "feature.len(<datasource.english.idioms.revision.matches>)":61.0
   }
}
  1. In the table above, the feature feature.wikitext.revision.wikilinks in ORES has the value of 295.0 and in liftwing the same feature has the value of 0.0069587780857462995, which is obviously incorrect.
  2. In ORES the value of feature.enwiki.revision.category_links is 9.0, in liftwing moved to being value of feature.wikitext.revision.templates. The same case, in ORES the value of feature.enwiki.revision.cite_templates is 63.0, in liftwing moved to being value of feature.enwiki.main_article_templates.
  3. Only feature.wikitext.revision.chars, feature.wikitext.revision.content_chars and feature.wikitext.revision.ref_tags have correct values in liftwing.

editquality

I made a little program to filter out matched and unmatched features for editquality, the output is below:

  • matched features
match:  feature.english.badwords.revision.diff.match_delta_decrease 0 0
match:  feature.english.badwords.revision.diff.match_prop_delta_decrease 0.0 0.0
match:  feature.english.dictionary.revision.diff.dict_word_delta_decrease -8 -8
match:  feature.english.dictionary.revision.diff.non_dict_word_delta_decrease -1 -1
match:  feature.english.dictionary.revision.diff.non_dict_word_prop_delta_decrease -0.2 -0.2
match:  feature.english.informals.revision.diff.match_delta_decrease 0 0
match:  feature.english.informals.revision.diff.match_prop_delta_decrease 0.0 0.0
match:  feature.revision.page.is_articleish True True
match:  feature.revision.page.is_draftspace False False
match:  feature.revision.page.is_mainspace True True
match:  feature.revision.user.is_admin False False
match:  feature.revision.user.is_curator False False
match:  feature.revision.user.is_trusted False False
match:  feature.wikitext.revision.diff.markup_prop_delta_increase 0.0 0.0
match:  feature.wikitext.revision.diff.number_delta_increase 0.0 0.0
match:  feature.wikitext.revision.diff.number_prop_delta_decrease 0.0 0.0
match:  feature.wikitext.revision.diff.number_prop_delta_increase 0.0 0.0
match:  feature.wikitext.revision.diff.number_prop_delta_sum 0.0 0.0
match:  feature.wikitext.revision.diff.uppercase_word_delta_increase 0.0 0.0
match:  feature.wikitext.revision.diff.uppercase_word_prop_delta_increase 0.0 0.0
  • unmatched features
not match:  feature.english.badwords.revision.diff.match_delta_sum 	 0 	 True
not match:  feature.english.dictionary.revision.diff.dict_word_delta_sum 	 -8 	 0.0
not match:  feature.english.dictionary.revision.diff.dict_word_prop_delta_decrease 	 -1.0233243490289545 	 -1.0233243490289543
not match:  feature.english.dictionary.revision.diff.dict_word_prop_delta_increase 	 0.0 	 -8
not match:  feature.english.dictionary.revision.diff.dict_word_prop_delta_sum 	 -1.0233243490289545 	 0
not match:  feature.english.dictionary.revision.diff.non_dict_word_delta_increase 	 0 	 -1.0233243490289543
not match:  feature.english.dictionary.revision.diff.non_dict_word_delta_sum 	 -1 	 0.0
not match:  feature.english.dictionary.revision.diff.non_dict_word_prop_delta_increase 	 0.0 	 -1
not match:  feature.english.dictionary.revision.diff.non_dict_word_prop_delta_sum 	 -0.2 	 0
not match:  feature.len(<datasource.tokenized(datasource.revision.parent.text)>) 	 66143.0 	 11.094603054859508
not match:  feature.len(<datasource.tokenized(datasource.revision.text)>) 	 66118.0 	 0.0
not match:  feature.len(<datasource.wikitext.revision.markups>) 	 7394.0 	 -132.0
not match:  feature.len(<datasource.wikitext.revision.parent.markups>) 	 7398.0 	 10.68065458293961
not match:  feature.len(<datasource.wikitext.revision.parent.uppercase_words>) 	 204.0 	 5.3230099791384085
not match:  feature.len(<datasource.wikitext.revision.parent.words>) 	 23282.0 	 10.055478759522249
not match:  feature.len(<datasource.wikitext.revision.words>) 	 23273.0 	 -0.2
not match:  feature.revision.comment.has_link 	 False 	 20.10316791573008
not match:  feature.revision.comment.suggests_section_edit 	 True 	 False
not match:  feature.revision.diff.longest_new_repeated_char 	 1 	 0.0
not match:  feature.revision.diff.longest_new_token 	 1 	 -1.0
not match:  feature.revision.user.has_advanced_rights 	 False 	 1
not match:  feature.revision.user.is_anon 	 False 	 True
not match:  feature.revision.user.is_bot 	 False 	 1
not match:  feature.revision.user.is_patroller 	 True 	 False
not match:  feature.temporal.revision.user.seconds_since_registration 	 537891763 	 False
not match:  feature.wikitext.revision.chars 	 248535.0 	 -0.2
not match:  feature.wikitext.revision.diff.markup_delta_decrease 	 -4.0 	 0.11240769441152339
not match:  feature.wikitext.revision.diff.markup_delta_increase 	 0.0 	 0.0087621338372992
not match:  feature.wikitext.revision.diff.markup_delta_sum 	 -4.0 	 0.3537545203148266
not match:  feature.wikitext.revision.diff.markup_prop_delta_decrease 	 -0.5030674846625767 	 -4.0
not match:  feature.wikitext.revision.diff.markup_prop_delta_sum 	 -0.5030674846625767 	 -4.0
not match:  feature.wikitext.revision.diff.number_delta_decrease 	 0.0 	 -0.5030674846625767
not match:  feature.wikitext.revision.diff.number_delta_sum 	 0.0 	 -0.5030674846625767
not match:  feature.wikitext.revision.diff.uppercase_word_delta_decrease 	 -1.0 	 0.0
not match:  feature.wikitext.revision.diff.uppercase_word_delta_sum 	 -1.0 	 0.0
not match:  feature.wikitext.revision.diff.uppercase_word_prop_delta_decrease 	 -0.2 	 -1.0
not match:  feature.wikitext.revision.diff.uppercase_word_prop_delta_sum 	 -0.2 	 -1.0
not match:  feature.wikitext.revision.external_links 	 644.0 	 -9.0
not match:  feature.wikitext.revision.headings 	 35.0 	 -25.0
not match:  feature.wikitext.revision.parent.chars 	 248667.0 	 12.423873952433707
not match:  feature.wikitext.revision.parent.external_links 	 645.0 	 6.470799503782602
not match:  feature.wikitext.revision.parent.headings 	 35.0 	 3.58351893845611
not match:  feature.wikitext.revision.parent.ref_tags 	 451.0 	 6.113682179832232
not match:  feature.wikitext.revision.parent.tags 	 702.0 	 -1.0
not match:  feature.wikitext.revision.parent.templates 	 814.0 	 6.703188113240863
not match:  feature.wikitext.revision.parent.wikilinks 	 566.0 	 6.3561076606958915
not match:  feature.wikitext.revision.ref_tags 	 451.0 	 0.0
not match:  feature.wikitext.revision.tags 	 701.0 	 -1.0
not match:  feature.wikitext.revision.templates 	 814.0 	 0.0
not match:  feature.wikitext.revision.wikilinks 	 565.0 	 -4.0

Ok, I finally got it. The reason for the mismatch between feature name and value is because the feature name in my code is from trim(self.model.features):

feature_name = list(trim(self.model.features))
features = {str(f): v for f, v in zip(feature_name, feature_values)}

but the feature values in fetch_editquality_features for model scoring are extracted from self.model.features, without trim() :

feature_values ​​= list(self.extractor.extract(rev_id, self.model.features))

Trimmed features will remove modifiers, for example:

feature.log((len(<datasource.tokenized(datasource.revision.parent.text)>) + 1))
-> trim -> feature.len(<datasource.tokenized(datasource.revision.parent.text)>)

The former has a value of 11.099589462481495, and the latter has a value of 66143.0.

For the augmented feature output, we will return the base features. For model scoring, we give the model features without trim().

So the solution is simply to have feature values extracted from trim(self.model.features) for the augmented feature output.

Change 783843 had a related patch set uploaded (by AikoChou; author: AikoChou):

[machinelearning/liftwing/inference-services@main] editquality: fix incorrect values in augmented feature output

https://gerrit.wikimedia.org/r/783843

Change 783843 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] editquality: fix incorrect values in augmented feature output

https://gerrit.wikimedia.org/r/783843

Change 778225 merged by Kevin Bazira:

[machinelearning/liftwing/inference-services@main] draftquality: add the ORES augmented feature output

https://gerrit.wikimedia.org/r/778225

Change 778250 merged by Kevin Bazira:

[machinelearning/liftwing/inference-services@main] topic: add the ORES augmented feature output

https://gerrit.wikimedia.org/r/778250

Change 785848 had a related patch set uploaded (by AikoChou; author: AikoChou):

[machinelearning/liftwing/inference-services@main] articlequality: revert to non-transformer architecture

https://gerrit.wikimedia.org/r/785848

Change 786934 had a related patch set uploaded (by AikoChou; author: AikoChou):

[integration/config@master] inference-services: articlequality cleanup

https://gerrit.wikimedia.org/r/786934

Change 786934 merged by jenkins-bot:

[integration/config@master] inference-services: articlequality cleanup

https://gerrit.wikimedia.org/r/786934

Change 785848 merged by Kevin Bazira:

[machinelearning/liftwing/inference-services@main] articlequality: revert to non-transformer architecture

https://gerrit.wikimedia.org/r/785848

Change 788747 had a related patch set uploaded (by AikoChou; author: AikoChou):

[operations/deployment-charts@master] ml-services: update values.yaml for articlequality models

https://gerrit.wikimedia.org/r/788747

Current status:

  • found and fixed the incorrect values in augmented feature
  • added augmented output to draftquality and topic models
  • reverted articlequality to non-transformer architecture because the transformer needs to load the model to get model features

Next steps:

  • deploy the new editquality and draftquality images for single wiki to test the feature
  • add augmented output to articlequality

Change 778248 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] articlequality: add the ORES augmented feature output

https://gerrit.wikimedia.org/r/778248

Change 788747 merged by jenkins-bot:

[operations/deployment-charts@master] ml-services: update values.yaml for articlequality models

https://gerrit.wikimedia.org/r/788747

Change 790293 had a related patch set uploaded (by AikoChou; author: AikoChou):

[operations/deployment-charts@master] ml-services: update editquality and draftquality image

https://gerrit.wikimedia.org/r/790293

Change 790293 merged by Elukey:

[operations/deployment-charts@master] ml-services: update editquality and draftquality image

https://gerrit.wikimedia.org/r/790293

Change 790700 had a related patch set uploaded (by AikoChou; author: AikoChou):

[machinelearning/liftwing/inference-services@main] articlequality: add wmf-certificates to blubber.yaml

https://gerrit.wikimedia.org/r/790700

Change 790700 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] articlequality: add wmf-certificates to blubber.yaml

https://gerrit.wikimedia.org/r/790700

Change 790983 had a related patch set uploaded (by AikoChou; author: AikoChou):

[operations/deployment-charts@master] ml-services: update articlequality image

https://gerrit.wikimedia.org/r/790983

Change 790983 merged by Elukey:

[operations/deployment-charts@master] ml-services: update articlequality image

https://gerrit.wikimedia.org/r/790983

Current status:

  • we deployed the new editquality image for arwiki and tested the feature
  • we deployed the new draftquality image and new articlequaity image (they only have enwiki isvc currently) and tested the feature

Next steps:

  • apply the same change to the other editquality models

This task is done. :) In T309102, we also applied the changes that we tested in arwiki-goodfaith to all revscoring-editquality-*.

https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/799349