Here's the basic interface of a scorer model: https://github.com/wikimedia/revscoring/blob/master/revscoring/scoring/models/model.py
Gist is that a ScorerModel contains the following members:
- features: A list of revscoring.Feature or FeatureVector
- score: A method that takes a set of extracted features as an argument and produces a JSON blob as output
- info: A ModelInfo object that contains a model name, version, fitness statistics, etc.
See https://github.com/wikimedia/revscoring/blob/master/revscoring/scoring/models/sklearn.py for how we implement this for scikit-learn-based models.
@Isaac already did a bunch of work with this for topic modeling. You can find his code on stat1007.eqiad.wmnet.
This is the script that I used to preprocess your article text + labels to fastText format (and split into training/val/test): /home/isaacj/fastText/drafttopic/drafttopic_article_fasttext_preprocess.py
This is the script that I used to train / evaluate a model on the preprocessed text: /home/isaacj/fastText/drafttopic/drafttopic_article_fasttext_model.py