Status update (September 14th, 2016)
September 14th, 2016

(This post was copied from


This is the 21st weekly update from revision scoring team that we have sent
to this mailing list.

New development

  • We received a request to get moving on Spanish Wikibooks support, so we dug in:
  • We deployed a new Wiki labels campaign[1]
  • We fixed an issue in Wiki labels that prevented requests from *.[2]
  • We trained a basic "revert" detection model that seems to be pretty effective[3]
  • We also generated a dataset of article quality scores for English Wikipedia[4]. You can download it here: [5]

This week, we invested in some long term tasks. If you review our
phabricator board, you'll see substantial progress in improving our damage
detection models with hashing vectorization strategies[6, 7], implementing
a more robust model testing strategy[8], and implementing some advance
natural language processing strategies[9, 10]. Stay tuned for the
completion of these activities in the coming weeks.

  1. -- Add uniqueness constraints to ores_classification
  2. -- Fix CORS for wikibooks
  3. -- Train/test reverted model for Spanish Wikibooks
  4. -- Generate recent article quality scores for English Wikipedia
  6. -- [Spike] Investigate HashingVectorizer
  8. -- Train on all data, Report test statistics on cross-validation
  9. -- Implement PCFG features

Aaron from the Revision Scoring team

Written by Halfak on Jun 3 2017, 5:01 PM.
Principal Research Scientist