Current structure
- score_processor(context_map, score_cache, metrics_collect) (/scores/enwiki/reverted/23456789)
- scoring_context (== wiki)
- scorer_model (revscoring)
- score_caches
- metrics_collectors
- wsgi -- (Flask framework)
- utilities -- (CLI utilities)
Problems:
- Cool thing: https://ores.wmflabs.org/v2/scores/enwiki/damaging/642345234/?features
- It gets messy when working with celery (multiprocessing)
- Performance improvment: https://phabricator.wikimedia.org/T134606
- Opportunity: Many models use the same features (reverted, damaging, goodfaith) & (wp10)
- Problem: cache can't be modified in place in celery/multiprocess (pickle and send somewhere)
- Solution: change _process function to apply to multiple models
- Painful to do features output or cache injection with complex request
- Web nodes load models into memory for no reason
- https://github.com/wiki-ai/ores/blob/master/ores/score_processors/celery.py#L35
- Solution: Don't load the models into memory on the web nodes -- celery needs know how to gets us model information
Proposed structure
Philosophy: Make "scoring system" look like ORES API -- wsgi (web nodes) just be a thin wrapper
- scoring_systems(context_map, score_cache, metrics_collector)
- Methods:
- score(context, model_names, rev_ids) -- "/score/context?model=<model_names>&revids=<rev_ids>" || "/score/context/<model_name>/?revids=<rev_ids>"
- model_info(context, model_names) -- "/scores/context?model_info&models=<model_names>" || "/scores/context/<model_name>/?model_info"
- Notes:
- variants (simple, multiprocessing_pool, celery_queue, etc.)
- Methods:
- scoring_context