Description

Revscoring currently has provision for scoring items with a single target label. The relevant true positives generation happens with something like:

y_preds=[s[self.prediction_key] == label for s, l in score_labels]

See https://github.com/wiki-ai/revscoring/blob/master/revscoring/scoring/statistics/classification/classification.py#L80

To incorporate true positives correctly for multi-label cases, (where target label might be a list of categories an article belongs to), we need to check membership in label set rather than strict equality( == ) like:
y_preds = [label in s[self.prediction_key] for ...]

Also use this opportunity to define an overall strategy for handling multiclass classification scoring in revscoring with respect to different fitness statistics.

Related Objects
Search...

Status	Assigned	Task
Resolved	None	T176324 Scoring platform team FY18 Q2
Resolved	Halfak	T183198 Scoring Platform FY18 Q3
Resolved	awight	T176336 Deploy drafttopic model to production ORES
Resolved	Halfak	T123327 Train/test draft topic model (new article routing AI)
Resolved	Sumit	T181166 Revscoring: Statistic for multilabel classification
Resolved	Halfak	T181163 Revscoring tune does not recognize a set of labels as target