Status update (November 10th, 2016)
November 10th, 2016

(This post was copied from https://lists.wikimedia.org/pipermail/ai/2016-November/000114.html)


This is the 29th weekly update from revision scoring team that we have sent
to this mailing list.


  • We deployed logging changes to ORES that will reduce the verbosity[1]
  • We also deployed revscoring 1.3.0 and new models built with it to WMF labs[2]. This won't change anything important from a user-perspective, but it paves the way for developing new modeling strategies.

Maintenance and robustness:

  • We fixed puppet so that log file directories are also created on the celery worker nodes (affects wmflabs)[3]
  • We fixed an issue with our recall_at_fpr metrics which was incorrectly defined and implemented a recall_at_precision metric to take its place[4]

New development:

  • We've made a lot of progress on modeling sentences and have just started experimenting with a sentence model from featured articles[5]
  • We're reviewing a dataset of spam/vandalism/attack new page creations for public release[6]. This dataset will help our collaborators work with us on modeling the quality of drafts and supporting new page triage.
  1. https://phabricator.wikimedia.org/T149730 -- Deploy logging changes to ORES
  2. https://phabricator.wikimedia.org/T150447 -- Deploy revscoring 1.3.0 and updated editquality and wikiclass to wmflabs
  3. https://phabricator.wikimedia.org/T149925 -- /srv/log/ores/ not created on worker nodes
  4. https://phabricator.wikimedia.org/T149825 -- Implement recall at precision (and fix FPR metrics)
  5. https://phabricator.wikimedia.org/T148867 -- Implement sentences datascources & experiment with normalization.
  6. https://phabricator.wikimedia.org/T150307 -- Create manually vetted dataset of spam/vandalism/attack pages

Aaron from the Revision Scoring team

Written by Halfak on Jun 3 2017, 5:48 PM.
Principal Research Scientist

Event Timeline