Overview
There is no textual user input. The only evidence that someone has used the system is that scores will be in the cache or extractions will in process within celery.
Within the context of a request, formatted http://ores.wmflabs.org/scores/enwiki/reverted/647810762/:
- enwiki is the instance of Wikipedia being queried
- reverted is te model being used)
- 647810762 is the rev_id
All models are versioned so that cache invalidation can be carried out when models are updated.
The revscore team has opted not to, themselves, build applications which consume output ofrom the revscore system. The impetus for moving the project into production is to make it available to product teams so that they can build consumer applications. @Legoktm has implemented one such application -- a MediaWiki extension which consumes data from ores and maintains a mirror of the ores cache.
Request Processing
Celery is checked to determine whether submitted revision is currently being evaluated. If not, celery is asked to extract features from data point and perform its machine learning data score.
Celery retrieves revision information using the public MediaWiki API.
- ores.wmflabs.org.yml, "extractors" section lists API locations
- ores is not logged in, and, therefore, cannot retrieve deleted revisions
Libraries and Dependencies
- ores-wikimedia-config - WMF-specific configuration for the project
- ores - REST web service which handles execution of models using revscoring
- revscoring - generic library which uses models for revision scoring
- python-mwapi - thin wrapper of MediaWiki API
- deltas - generic library which implements diff algorithms based on recent research
- yamlconf - generic library which implements yaml config file reading and propogates defaults
Subsystems
revscoring - contains command line utilities
- [[https://pythonhosted.org/revscoring/revscoring.utilities.html?highlight=train_test#module-revscoring.utilities.extract_features|extract_features]] - extracts features from revisions, for submission to train_test
- [[https://pythonhosted.org/revscoring/revscoring.utilities.html?highlight=train_test#module-revscoring.utilities.train_test|train_test]] - trains and tests models
- train_test writes out a model file in Python pickle format
- ores-wikimedia-config contains these models
See fabfile.py for information on deployment.
Infrastructure
ores-lb-02.ores.eqiad.wmflabs - load balancer (nginx)
ores-web-0[1-2] - web servers, running uwsgi on tcp/8080
ores-worker-0[1-4] - celery nodes; guessing redis runs here as well?
ores-staging-01 - runs all of the parts of larger; used for staging
ores-misc-01 - used for Debian packaging