Status update (April 14th, 2017)
April 14th, 2017

In this update, I'm going to change some things up to try and make this update easier for you to consume. The biggest change you'll notice is that I've broken up the [#] references in each section. I hope that saves you some scrolling and confusion. You'll also notice that I have changed the subject line from "Revision scoring" to "Scoring Platform" because it's now clear that, come July, I'll be leading a new team with that name at the Wikimedia Foundation. There'll be an announcement about that coming once our budget is finalized. I'll try to keep this subject consistent for the foreseeable future so that your email clients will continue to group the updates into one big thread.

Deployments & maintenance:

In this cycle, we've gotten better at tracking our deployments and noting what changes do out with each deployment. You can click on the phab task for a deployment and observe the sub-tasks to find out what was deployed. We had 3 deployments for ORES since mid-march[1,2,3]. We've had two deployments to Wikilabels[4,5] and we've added a maintenance notices for a short period of downtime that's coming up on April 21st[6,7].

  1. -- Deploy ores in prod (Mid-March)
  2. -- Deploy ORES late march
  3. -- Deploy ORES early April
  4. -- Late march wikilabels deployment
  5. -- Deploy Wikilabels mid-April
  6. -- Add header to Wikilabels that warns of upcoming maintenance.
  7. -- Manage wikilabels for labsdb1004 maintenance

Making ORES better:

We've been working to make ORES easier to extend and more useful. ORES now reports it's relevant versions at[8]. We've also reduced the complexity of our "precaching" system that scores edits before you ask for them[9,10]. We're taking advantage of logstash to store and query our logs[11]. We've also implemented some nice abstractions for requests and responses in ORES[12] that allowed us to improve our metrics tracking substantially[13].

  1. -- Expose version of the service and its dependencies
  2. -- Create generalized "precache" endpoint for ORES
  3. -- Switch /precache to be a POST end point
  4. -- Send ORES logs to logstash
  5. -- Exclude precaching requests from cache_miss/cache_hit metrics
  6. -- Implement ScoreRequest/ScoreResponse pattern in ORES

New functionality:

In the last month and a half, we've added basic support to Korean Wikipedia[14,15]. Props to Revi for helping us work through a bunch of issues with our Korean language support[16,17,18].

We've also gotten the ORES Review tool deployed to Hebrew Wikipedia[19,20,21,22] and Estonian Wikipedia[23,24,25]. We're also working with the Collaboration team to implement the threshold test statistics that they need to tune their new Edit Review interface[26] and we're working towards making this kind of work self-serve so that that product team and other tool developers won't have to wait on us to implement these threshold stats in the future[27].

  1. -- Deploy reverted model for kowiki
  2. -- Train/test reverted model for kowiki
  3. -- Korean generated word lists are in chinese
  4. -- Add language support for Korean
  5. -- Fix tokenization for Korean
  6. -- Deploy ORES Review Tool for hewiki
  7. -- Deploy edit quality models for hewiki
  8. -- Train damaging and goodfaith models for hewiki
  9. -- Complete hewiki edit quality campaign
  10. -- Deploy ORES review tool to etwiki
  11. -- Deploy edit quality models for etwiki
  12. -- Complete etwiki edit quality campaign
  13. -- Implement additional test_stats in editquality
  14. -- Implement "thresholds", deprecate "pile of tests_stats"

ORES training / labeling campaigns:

Thanks to a lot of networking at Wikimedia Conference and some help from Ijon (Asaf Batrov), we've found a bunch of new collaborators to help us deploy ORES to new wikis. As is critcial in this process, we need to deploy labeling campaigns so that Wikipedians can help us train ORES.

We've got new editquality labeling campaigns deployed to Albanian[28], Finnish[29], Latvian[30], Korean[31], and Turkish[21] Wikipedias.

We've also been working on a new type of model: "Item quality" in Wikidata. We've deployed, labeled, and analyzed a pilot[33], fixed some critical bugs that came up[34,35], and we've finally launched a 5k item campaign which is already 17% done[36]! See if you'd like to help us out.

  1. -- Edit quality campaign for Albanian Wikipedia
  2. -- Edit quality campaign for Finnish Wikipedia
  3. -- Edit quality campaign for Latvian Wikipedia
  4. -- Deploy editquality campaign in Korean Wikipedia
  5. -- Start v2 editquality campaign for trwiki
  6. -- Deploy the pilot of Wikidata item quality campaign
  7. -- Wikidata items render badly in Wikilabels
  8. -- Implement "unwanted pages" filtering strategy for Wikidata
  9. -- Deploy Wikidata item quality campaign

Bug fixing:

As usual, we have a few weird bug that got in our way. We needed to move to a bigger virtual machine in "Beta Labs" because our models take up a bunch of hard drive space[37]. We found that Wikilabels wasn't removing expired tasks correctly and that this was making it difficult to finish labeling campaigns[38]. We also had a lot of right-to-left issues when we did an upgrade of OOjs UI[39]. There was an old bug we had with in one of our message keys[40].

  1. -- deployment-ores-redis /srv/ redis is too small (500MBytes)
  2. -- Wikilabels is not cleaning up expired tasks for Wikidata item quality campaign
  3. -- Fix RTL issues in Wikilabels after OOjs UI upgrade
  4. -- qqq for a wiki-ai message cannot be loaded

Principal Research Scientist
Head of the Scoring Platform Team

(This post was copied from

Written by Halfak on Jun 3 2017, 6:30 PM.
Principal Research Scientist