Revscoring currently has provision for scoring items with a single target label. The relevant true positives generation happens with something like:
y_preds=[s[self.prediction_key] == label for s, l in score_labels]
To incorporate true positives correctly for multi-label cases, (where target label might be a list of categories an article belongs to), we need to check membership in label set rather than strict equality( == ) like:
y_preds = [label in s[self.prediction_key] for ...]
Also use this opportunity to define an overall strategy for handling multiclass classification scoring in revscoring with respect to different fitness statistics.