Status update (October 11th, 2016)
October 11th, 2016

(This post was copied from


This is the 24th and 25th weekly update from revision scoring team that we
have sent to this mailing list. We skipped a week due to travel and other

Maintenance and robustness:

  • We improved the performance of RecentChanges fitlering in the ORES extension[1]
  • We built and ran a maintenance script to clean up duplicate cached data for the ORES extension[2,3]
  • We updated the editquality models for the new version of revscoring (1.3.0)[4] and made some upstream changes to json2tsv to make that easier[5]
  • We quited down some of our error reporting so that our logs take up less space[6]


  • We generated a dataset that uses the "wp10" prediction model to assess article quality in monthly intervals for English, French, and Russian Wikipedia[7]. This should enable new research into the quality dynamics of these wikis.
  • We generated a dataset of vandalism, spam, and attack page creations for building a new "draft quality" model[8]


  • Presented about transparent/open AI development practices around ORES at the Association of Internet Researchers[9]

New development:

  • We've made substantial progress towards adding ORES data to MediaWiki's api.php endpoints with rcshow=oresreview[10] and rvprop=ores[11]
  1. -- hidenondamaging=1 query is extremely slow on enwiki
  2. -- Ensure ORES data violating constraints do not affect production
  3. -- Build a maintenance script to clean up duplicate data
  4. -- Update editquality for revscoring 1.3.0
  5. -- Add type decoding support to tsv2json
  6. -- Quiet result.get Warning in tasks
  7. -- Generate monthly article quality dataset
  8. -- Generate spam and vandalism new page creation dataset
  9. -- Present about ORES transparency at AoIR
  10. -- Introduce rcshow=oresreview and similar ones
  11. -- Introduce ORES rvprop

Aaron from the Revision Scoring team

Written by Halfak on Jun 3 2017, 5:14 PM.
Principal Research Scientist