Two outages with documentation. Revscoring 2.0 coming with better model information and "thresholds". New support for Romanian, Albanian, Tamil, Greek, and Bengali. We're officially welcoming @awight to the team!
As of July 1st, we are officially the Scoring Platform team. We're welcoming Adam Wight (@awight) to the team officially.
The last ~month was very productive, but we had two major production issues. See 20170613-ORES and 20170623-ORES. As you will see below, there's a series of tasks that address problems that were related to these issues.
Despite dealing with production issues, we've been able to get a very substantial change to the revscoring library merged. This change will make accessing information about models (build environment, test statistics, scoring thresholds, etc.) much easier. This will cause a breaking change in ORES UI so we'll be making an announcement when we roll it out. Stay tuned.
We've also increased our language and model coverage substantially. We even built and deployed a totally new type of model to help out French Wikisource!
New team stuff
So with the new fiscal year, we're a new team. We're working on an announcement to be posted on the WMF blog. That should be coming out soon. See T169755. Most of the new team stuff focused on getting Adam all of the rights he needed to do ORES deploys and other work.
- T168917: Get Adam all the rights
- T168443: Grant AWight CR+2 on scoring platform repos
- T169915: Create scoring-internal mailing list for Scoring Platform team
- T168442: Grant AWight accounts on ores production clusters
We had two major downtime events with ORES. One of these (20170613-ORES) was not our fault, but we still set up better monitoring (T167830) so that, when it happens again, we can fix it more quickly. The second event (20170623-ORES) was due to a deeply problematic regular expression pattern that had ~ a 1 in a billion chance of causing catastrophic failure. We both fixed the regular expression (T168888) and fixed the timeout that didn't catch the out-of-control regex match (T168965)
- T167819: ORES in eqiad is unhappy
- T167830: Extend icinga check to catch 500 errors like those of the 20170613 incident
- T169367: [Investigate] some revisions frequently return TaskRevokedError
- T168888: Fix degenerate regular expressions for matching "hahaha" and "jajaja"
- T168965: Why don't timeouts work during long regular expression matching?
- T170205: Add test to ensure timeout of functions taking too long
- T168889: Rebuild all of the models for ORES (new regexes)
New language support
We were lucky to have a lot of volunteers working with us this month so that allowed us to make a lot of progress towards expanding support to more wikis. Both the Albanian and Romanian Wikipedias finished their labeling campaigns so we'll be able to deploy advanced support to them soon (T163010, T156517). We now have some of the basic language assets for Tamil so we should be able to build up basic support for that Wikipedia soon (T166052). We also implemented an article quality model for Turkish Wikipedia (T164671) thanks to lots of work by @Mavrikant. We developed a new strategy for cross-language badword/informal detection and addressed some lingustic overlap between English and Hungarian Wikipedia (T167231, T165872). We implemented a page-level OCR model for French Wikisource (somewhat like article quality, but more about the quality of machine reader transcriptions) (T167196). Finally, we deployed the ORES Review Tool to French Wikipedia (T165044)
- T163010: Complete Albanian Wikipedia editquality campaign
- T156517: Complete Romanian Wikipedia edit quality campaign
- T166052: Language assets for Tamil
- T164671: Implement wp10 model for trwiki
- T167231: Remove other non-badwords from huwiki model.
- T165872: Don't use "ha" as an informal in hungarian
- T167196: Implement page_level (OCR) model for frwikisource
- T165044: Deploy ORES review tool on French Wikipedia
Data release -- Monthly Article Quality predictions (English Wikipedia)
This was a long time coming. We've got the data that allowed us to measure the coverage gap of articles about Women Scientists in Wikipedia hosted in labs (T146718). That means the table can be queried directly from Quarry. See this demo query.
New features for ORES/revscoring
Prompted by concerns raised by @Catrope from the Collaboration-Team-Triage, we have been working on a better way to represent information about a model (T162217): build environment, statistics, prediction thresholds, etc. We've even built a way to allow for querying the thresholds of a model that we refer to as "threshold optimizations". This refactoring gave us an opportunity to address some other outstanding wants with regards to revscoring -- e.g. storing more information about the build environment (T160223) and cleaning up our "tune" utility (T163711).
- T162217: Implement "thresholds", deprecate "pile of tests_stats"
- T163711: Use our own scoring models in `tune` utility
- T169157: revscoring train_model dies without --observations
- T160223: Store the detailed system information inside of model files.
Wikilabels UX improvement & maintenance.
Thanks to @Jan_Dittrich and @Pginer-WMF's feedback, we've been working on addressing some user-experience issues. These were mostly fixes to language to make the functionality of the system more clear (T167079, T138736). We also brought Wikilabels down for a short period of time on Tuesday July 11th for scheduled database maintenance (T169933).
- T169933: Notify Wikilabels users of short downtime on July 11 @ 1400 UTC
- T167061: Early June 2017 Wiki labels deploy
- T167079: Initial set of UX fixes for Wiki labels
- T138736: Rename "abandon" button to something less confusing
ORES Review Tool improvements
We finished up some patchsets that were blocked for a long time on some fixes to core MediaWiki. This allowed us to fixed highlighting in Special:RecentChanges and Special:Watchlist (T155903, T155930).
General ORES maintenance
We've done a bunch of maintenance to ORES to solve a variety of issues that cropped up. E.g. improving tests (T168007), solving a regression in the basic ORES ui (T149117), fixing our new precaching system (T168674) and enabling it to work with the new EventStreams feed (T166046).
- T168007: Add API tests to ORES CI
- T149117: ORES UI is broken
- T168920: ORES 500's on integers that can't be processed
- T168674: ORES POST precaching always fails with 500
- T149118: ORES UI doesn't handle API errors
- T166046: Switch ores precache to use new EventStreams
- T162184: ORES swagger doc based API requests do not work
Misc operations work, versions and styling
- T169129: Remove custom apt repo from ores labs boxes
- T167612: Make names for Wiki-AI diffusion repos consistent
- T165716: No new data on ores_classification on beta labs since march memory issue
- T167604: upgrade pytz to 2017.2 for revscoring
- T167303: Update Travis CI from precise
- T167149: Test if ORES celery can use the unix socket
- T168904: Minor cleanup in Makefiles
- T169809: Set up larger ores-compute instance
- T169164: ORES puppet error on labs boxes, unable to set user to "deploy-service"
- T169473: Add flake8 to travis checks