Status update (September 14th, 2016)
September 14th, 2016
(This post was copied from https://lists.wikimedia.org/pipermail/ai/2016-September/000088.html)
Hey,
This is the 21st weekly update from revision scoring team that we have sent
to this mailing list.
New development
- We received a request to get moving on Spanish Wikibooks support, so we dug in:
- We deployed a new Wiki labels campaign[1]
- We fixed an issue in Wiki labels that prevented requests from *. wikibooks.org[2]
- We trained a basic "revert" detection model that seems to be pretty effective[3]
- We also generated a dataset of article quality scores for English Wikipedia[4]. You can download it here: [5]
This week, we invested in some long term tasks. If you review our
phabricator board, you'll see substantial progress in improving our damage
detection models with hashing vectorization strategies[6, 7], implementing
a more robust model testing strategy[8], and implementing some advance
natural language processing strategies[9, 10]. Stay tuned for the
completion of these activities in the coming weeks.
- https://phabricator.wikimedia.org/T143962 -- Add uniqueness constraints to ores_classification
- https://phabricator.wikimedia.org/T145406 -- Fix CORS for wikibooks
- https://phabricator.wikimedia.org/T145428 -- Train/test reverted model for Spanish Wikibooks
- https://phabricator.wikimedia.org/T135684 -- Generate recent article quality scores for English Wikipedia
- https://datasets.wikimedia.org/public-datasets/enwiki/article_quality/wp10-scores-enwiki-20160820.tsv.bz2
- https://phabricator.wikimedia.org/T128087 -- [Spike] Investigate HashingVectorizer
- https://en.wikipedia.org/wiki/Feature_hashing
- https://phabricator.wikimedia.org/T142953 -- Train on all data, Report test statistics on cross-validation
- https://phabricator.wikimedia.org/T144636 -- Implement PCFG features
- https://en.wikipedia.org/wiki/Stochastic_context-free_grammar
Sincerely,
Aaron from the Revision Scoring team
- Projects
- Subscribers
- None