Inconsistent data type for articlequality score predictions on ptwiki
Closed, ResolvedPublic1 Estimated Story PointsBUG REPORT
Actions

Assigned To

Authored By

	He7d3r
	Mar 2 2024, 2:46 PM

Description

Steps to replicate the issue (include links if applicable):

Replace an old ORES V1 request by a V3 "equivalent", such as https://ores.wikimedia.org/v3/scores/ptwiki/60845189/articlequality

What happens?: The predicted article class is now a boolean instead of one of the strings that appear as keys for the probability dictionary:

{
  "ptwiki": {
    "models": {
      "articlequality": {
        "version": "0.8.0"
      }
    },
    "scores": {
      "60845189": {
        "articlequality": {
          "score": {
            "prediction": true,
            "probability": {
              "1": 0.6505766596726758,
              "2": 0.09876741829105372,
              "3": 0.0634780511261501,
              "4": 0.06104126161283134,
              "5": 0.0573997480745124,
              "6": 0.06873686122277664
            }
          }
        }
      }
    }
  }
}

What should have happened instead?: The predicted article class should be identical to one of the keys in the probability dictionary (the string "1" in the example, which is the key that maximizes the probability value):

...
            "prediction": "1",
            "probability": {
              "1": 0.6505766596726758,
...

Other information:

The problem does not happen when one of the other classes has the highest probability:

"prediction": "2": https://ores.wikimedia.org/v3/scores/ptwiki/66497121/articlequality
"prediction": "3": https://ores.wikimedia.org/v3/scores/ptwiki/66513703/articlequality
"prediction": "4": https://ores.wikimedia.org/v3/scores/ptwiki/66078571/articlequality
"prediction": "5": https://ores.wikimedia.org/v3/scores/ptwiki/65832901/articlequality
"prediction": "6": https://ores.wikimedia.org/v3/scores/ptwiki/67282651/articlequality

Details

	Subject	Repo	Branch	Lines +/-
	ml-services: update ores-legacy image (fix boolean/str fields)	operations/deployment-charts	master	+1 -1
	ores-legacy: fix mixed boolean and string field	machinelearning/liftwing/inference-services	main	+11 -2

Customize query in gerrit

Related Objects

Mentioned In: rMLISb4ca64f97ace: ores-legacy: fix mixed boolean and string field
Mentioned Here: rOMWC1009470b3df7

Event Timeline

He7d3r created this task.Mar 2 2024, 2:46 PM

Restricted Application added a project: Machine-Learning-Team. · View Herald TranscriptMar 2 2024, 2:46 PM

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

calbon updated Other Assignee, added: isarantopoulos.Mar 5 2024, 3:42 PM

calbon set the point value for this task to 1.

calbon moved this task from Unsorted to Ready To Go on the Machine-Learning-Team board.

isarantopoulos moved this task from Ready To Go to In Progress on the Machine-Learning-Team board.Mar 6 2024, 11:24 AM

I found that this is caused because of the mixed schema of the responses returned by ORES. The prediction field is either a boolean, a string or a list of strings and we have the following in our schema

class Score(BaseModel):
    prediction: Union[bool, str, List[str]]
    probability: Dict[str, float]

The prediction field in the above pydantic model also declares a priority. This means that first it will try to evaluate a boolean and this is what happens as "1" is evaluated as true.
I'm working to provide a universal solution for this to cater for both options properly (booleans and strings).

isarantopoulos claimed this task.Mar 7 2024, 10:12 AM

Change rOMWC1009470b3df7 had a related patch set uploaded (by Ilias Sarantopoulos; author: Ilias Sarantopoulos):

[machinelearning/liftwing/inference-services@main] ores-legacy: fix mixed boolean and string field

https://gerrit.wikimedia.org/r/1009470

gerritbot added a project: Patch-For-Review.Mar 7 2024, 10:20 AM

isarantopoulos updated Other Assignee, removed: isarantopoulos.Mar 7 2024, 2:41 PM

The attached patch solves the issue. I will deploy it to staging and add some httpbb tests that capture this behavior before I deploy to production.

Change rOMWC1009470b3df7 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] ores-legacy: fix mixed boolean and string field

https://gerrit.wikimedia.org/r/1009470

isarantopoulos mentioned this in rMLISb4ca64f97ace: ores-legacy: fix mixed boolean and string field.Mar 8 2024, 9:39 AM

Maintenance_bot removed a project: Patch-For-Review.Mar 8 2024, 10:30 AM

Change 1009720 had a related patch set uploaded (by Ilias Sarantopoulos; author: Ilias Sarantopoulos):

[operations/deployment-charts@master] ml-services: update ores-legacy image (fix boolean/str fields)

https://gerrit.wikimedia.org/r/1009720

gerritbot added a project: Patch-For-Review.Mar 8 2024, 10:30 AM

Change 1009720 merged by jenkins-bot:

[operations/deployment-charts@master] ml-services: update ores-legacy image (fix boolean/str fields)

https://gerrit.wikimedia.org/r/1009720

Maintenance_bot removed a project: Patch-For-Review.Mar 8 2024, 12:30 PM

@He7d3r I have deployed the fix in production and it is working as expected.

That is great! Thank you! 😃

isarantopoulos closed this task as Resolved.Mar 13 2024, 4:39 PM

isarantopoulos moved this task from In Progress to 2023-2024 Q3 Done on the Machine-Learning-Team board.

Inconsistent data type for articlequality score predictions on ptwikiClosed, ResolvedPublic1 Estimated Story PointsBUG REPORTActions

Description

Details

Related Objects

Event Timeline

Inconsistent data type for articlequality score predictions on ptwiki
Closed, ResolvedPublic1 Estimated Story PointsBUG REPORT
Actions