Page MenuHomePhabricator

Score multiple models with the same cached dependencies
Closed, ResolvedPublic

Description

OK. So feature extraction is a funny thing. It turns out that most of our models use a similar set of features and otherwise share many dependencies. In most cases, reverted, damaging, and goodfaith use the *exact same features*. But when you make a request to ORES that looks like this ...

https://ores.wmflabs.org/v2/scores/enwiki/?models=damaging|reverted&revids=1234

... features will be extracted independently for damaging and reverted. That's a waste. We could save a bunch of time by extracting features for one model and then passing the feature extraction cache to the next.

Event Timeline

Halfak created this task.May 6 2016, 10:06 PM
Restricted Application added subscribers: Zppix, Aklapper. · View Herald TranscriptMay 6 2016, 10:06 PM
$ python
Python 3.4.3 (default, Jul 28 2015, 18:20:59) 
[GCC 4.8.4] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import logging
>>> logger = logging.getLogger('revscoring')
>>> logger.setLevel(logging.DEBUG)
>>> import revscoring
>>> from revscoring.dependencies import solve
>>> from revscoring.features import wikitext
>>> from revscoring.datasources import revision_oriented as ro
>>> solve(wikitext.revision.templates, cache={ro.revision.text: "foo {{bar}} derp"})
DEBUG:revscoring.dependencies.dependent:Executing datasource.wikitext.revision.wikicode (0 calls so far).
DEBUG:revscoring.dependencies.dependent:Executing datasource.wikitext.revision.node_class_map (0 calls so far).
DEBUG:revscoring.dependencies.dependent:Executing datasource.wikitext.revision.templates (0 calls so far).
DEBUG:revscoring.dependencies.dependent:Executing feature.wikitext.revision.templates (0 calls so far).
1
>>> cache = {ro.revision.text: "foo {{bar}} derp"}
>>> solve(wikitext.revision.templates, cache=cache)
DEBUG:revscoring.dependencies.dependent:Executing datasource.wikitext.revision.wikicode (1 calls so far).
DEBUG:revscoring.dependencies.dependent:Executing datasource.wikitext.revision.node_class_map (1 calls so far).
DEBUG:revscoring.dependencies.dependent:Executing datasource.wikitext.revision.templates (1 calls so far).
DEBUG:revscoring.dependencies.dependent:Executing feature.wikitext.revision.templates (1 calls so far).
1
>>> cache
{<feature.wikitext.revision.templates>: 1, <datasource.wikitext.revision.templates>: ['{{bar}}'], <datasource.wikitext.revision.wikicode>: 'foo {{bar}} derp', <datasource.wikitext.revision.node_class_map>: {<class 'mwparserfromhell.nodes.text.Text'>: ['foo ', 'bar', ' derp'], <class 'mwparserfromhell.nodes.template.Template'>: ['{{bar}}']}, <datasource.revision.text>: 'foo {{bar}} derp'}
>>> solve(wikitext.revision.templates, cache=cache)
1
Ladsgroup claimed this task.May 7 2016, 6:47 AM
Ladsgroup moved this task from Active to Backlog on the Scoring-platform-team (Current) board.
Halfak removed Ladsgroup as the assignee of this task.May 16 2016, 3:52 PM
Halfak moved this task from Untriaged to Ideas on the Scoring-platform-team board.May 16 2016, 3:58 PM
Halfak moved this task from Ideas to New development on the Scoring-platform-team board.
He7d3r added a subscriber: He7d3r.May 20 2016, 2:11 AM
Halfak claimed this task.Jun 29 2016, 3:32 PM

Made some substantial progress here: https://github.com/wiki-ai/ores/pull/144 It required some deeper refactoring than expected and might result in a v3 spec for response structure. We'll see.

I just switched to a new branch with the commits split up. See https://github.com/wiki-ai/ores/pull/155

Halfak closed this task as Resolved.Aug 2 2016, 9:47 PM