Page MenuHomePhabricator

Score multiple models with the same cached dependencies
Closed, ResolvedPublic

Description

OK. So feature extraction is a funny thing. It turns out that most of our models use a similar set of features and otherwise share many dependencies. In most cases, reverted, damaging, and goodfaith use the *exact same features*. But when you make a request to ORES that looks like this ...

https://ores.wmflabs.org/v2/scores/enwiki/?models=damaging|reverted&revids=1234

... features will be extracted independently for damaging and reverted. That's a waste. We could save a bunch of time by extracting features for one model and then passing the feature extraction cache to the next.

Event Timeline

$ python
Python 3.4.3 (default, Jul 28 2015, 18:20:59) 
[GCC 4.8.4] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import logging
>>> logger = logging.getLogger('revscoring')
>>> logger.setLevel(logging.DEBUG)
>>> import revscoring
>>> from revscoring.dependencies import solve
>>> from revscoring.features import wikitext
>>> from revscoring.datasources import revision_oriented as ro
>>> solve(wikitext.revision.templates, cache={ro.revision.text: "foo {{bar}} derp"})
DEBUG:revscoring.dependencies.dependent:Executing datasource.wikitext.revision.wikicode (0 calls so far).
DEBUG:revscoring.dependencies.dependent:Executing datasource.wikitext.revision.node_class_map (0 calls so far).
DEBUG:revscoring.dependencies.dependent:Executing datasource.wikitext.revision.templates (0 calls so far).
DEBUG:revscoring.dependencies.dependent:Executing feature.wikitext.revision.templates (0 calls so far).
1
>>> cache = {ro.revision.text: "foo {{bar}} derp"}
>>> solve(wikitext.revision.templates, cache=cache)
DEBUG:revscoring.dependencies.dependent:Executing datasource.wikitext.revision.wikicode (1 calls so far).
DEBUG:revscoring.dependencies.dependent:Executing datasource.wikitext.revision.node_class_map (1 calls so far).
DEBUG:revscoring.dependencies.dependent:Executing datasource.wikitext.revision.templates (1 calls so far).
DEBUG:revscoring.dependencies.dependent:Executing feature.wikitext.revision.templates (1 calls so far).
1
>>> cache
{<feature.wikitext.revision.templates>: 1, <datasource.wikitext.revision.templates>: ['{{bar}}'], <datasource.wikitext.revision.wikicode>: 'foo {{bar}} derp', <datasource.wikitext.revision.node_class_map>: {<class 'mwparserfromhell.nodes.text.Text'>: ['foo ', 'bar', ' derp'], <class 'mwparserfromhell.nodes.template.Template'>: ['{{bar}}']}, <datasource.revision.text>: 'foo {{bar}} derp'}
>>> solve(wikitext.revision.templates, cache=cache)
1

Made some substantial progress here: https://github.com/wiki-ai/ores/pull/144 It required some deeper refactoring than expected and might result in a v3 spec for response structure. We'll see.

I just switched to a new branch with the commits split up. See https://github.com/wiki-ai/ores/pull/155