Page MenuHomePhabricator

Revscoring: Statistic for multilabel classification
Closed, ResolvedPublic

Description

Revscoring currently has provision for scoring items with a single target label. The relevant true positives generation happens with something like:

y_preds=[s[self.prediction_key] == label for s, l in score_labels]

See https://github.com/wiki-ai/revscoring/blob/master/revscoring/scoring/statistics/classification/classification.py#L80

To incorporate true positives correctly for multi-label cases, (where target label might be a list of categories an article belongs to), we need to check membership in label set rather than strict equality( == ) like:
y_preds = [label in s[self.prediction_key] for ...]

Also use this opportunity to define an overall strategy for handling multiclass classification scoring in revscoring with respect to different fitness statistics.