Status update (September 28th, 2016)
September 28th, 2016

(This post was copied from


This is the 23rd weekly update from revision scoring team that we have sent
to this mailing list.

New development

  • We implemented and demonstrated a linguistic/stylometric processing strategy that should give us more signal for finding vandalism and spam[1]. See the discussion on the AI list[2].
  • As part of our support for the Collaboration Team, we've been producing tables of model statistics that correspond to set of thresholds[3]. This helps their designers work on strategies for reporting prediction confidence in an intuitive way.

Maintenance and robustness

  • We had a major downtime event that was caused by our logs being too verbose. We've recovered and turned down the log level[4].
  • We made sure that halfak got pings when goes down[5]


  • We created a database on Wikimedia Labs that provides access to a dataset containing a complete set of article quality predictions for English Wikipedia[6]. See our announcements[7,8,9].
  1. -- Implement a basic scoring strategy for PCFGs
  3. -- Produce tables of stats for damaging and goodfaith models
  4. -- celery log level is INFO causing disruption on ORES service
  5. -- Ensure that halfak gets emails when goes down
  6. -- Setup a db on labsdb for article quality that is publicly accessible
  7. -- Announce article quality database in labsdb

Aaron from the Revision Scoring team

Written by Halfak on Jun 3 2017, 5:12 PM.
Principal Research Scientist