Score multiple models with the same cached dependencies
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Halfak
	May 6 2016, 10:06 PM

Description

OK. So feature extraction is a funny thing. It turns out that most of our models use a similar set of features and otherwise share many dependencies. In most cases, reverted, damaging, and goodfaith use the *exact same features*. But when you make a request to ORES that looks like this ...

https://ores.wmflabs.org/v2/scores/enwiki/?models=damaging|reverted&revids=1234

... features will be extracted independently for damaging and reverted. That's a waste. We could save a bunch of time by extracting features for one model and then passing the feature extraction cache to the next.

Related Objects
Search...

Status	Assigned	Task
Resolved	None	T139408 [Epic] ORES refactor: Scoring structure
Resolved	Halfak	T134606 Score multiple models with the same cached dependencies
Resolved	Halfak	T134781 Make cache be preserved (in place) when solving dependencies
Resolved	Halfak	T136875 [Spike] Implement & test dependent tasks in Celery

Event Timeline

Halfak created this task.May 6 2016, 10:06 PM

Restricted Application added subscribers: Zppix, Aklapper. · View Herald TranscriptMay 6 2016, 10:06 PM

$ python
Python 3.4.3 (default, Jul 28 2015, 18:20:59) 
[GCC 4.8.4] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import logging
>>> logger = logging.getLogger('revscoring')
>>> logger.setLevel(logging.DEBUG)
>>> import revscoring
>>> from revscoring.dependencies import solve
>>> from revscoring.features import wikitext
>>> from revscoring.datasources import revision_oriented as ro
>>> solve(wikitext.revision.templates, cache={ro.revision.text: "foo {{bar}} derp"})
DEBUG:revscoring.dependencies.dependent:Executing datasource.wikitext.revision.wikicode (0 calls so far).
DEBUG:revscoring.dependencies.dependent:Executing datasource.wikitext.revision.node_class_map (0 calls so far).
DEBUG:revscoring.dependencies.dependent:Executing datasource.wikitext.revision.templates (0 calls so far).
DEBUG:revscoring.dependencies.dependent:Executing feature.wikitext.revision.templates (0 calls so far).
1
>>> cache = {ro.revision.text: "foo {{bar}} derp"}
>>> solve(wikitext.revision.templates, cache=cache)
DEBUG:revscoring.dependencies.dependent:Executing datasource.wikitext.revision.wikicode (1 calls so far).
DEBUG:revscoring.dependencies.dependent:Executing datasource.wikitext.revision.node_class_map (1 calls so far).
DEBUG:revscoring.dependencies.dependent:Executing datasource.wikitext.revision.templates (1 calls so far).
DEBUG:revscoring.dependencies.dependent:Executing feature.wikitext.revision.templates (1 calls so far).
1
>>> cache
{<feature.wikitext.revision.templates>: 1, <datasource.wikitext.revision.templates>: ['{{bar}}'], <datasource.wikitext.revision.wikicode>: 'foo {{bar}} derp', <datasource.wikitext.revision.node_class_map>: {<class 'mwparserfromhell.nodes.text.Text'>: ['foo ', 'bar', ' derp'], <class 'mwparserfromhell.nodes.template.Template'>: ['{{bar}}']}, <datasource.revision.text>: 'foo {{bar}} derp'}
>>> solve(wikitext.revision.templates, cache=cache)
1

https://github.com/wiki-ai/ores/pull/142

Ladsgroup claimed this task.May 7 2016, 6:47 AM

Ladsgroup edited projects, added Machine-Learning-Team (Active Tasks); removed Machine-Learning-Team.

Ladsgroup moved this task from Parked to Backlog on the Machine-Learning-Team (Active Tasks) board.

Halfak added a subtask: T134781: Make cache be preserved (in place) when solving dependencies.May 9 2016, 4:33 PM

Halfak closed subtask T134781: Make cache be preserved (in place) when solving dependencies as Resolved.May 10 2016, 8:44 PM

Halfak removed Ladsgroup as the assignee of this task.May 16 2016, 3:52 PM

Halfak edited projects, added Machine-Learning-Team; removed Machine-Learning-Team (Active Tasks).

Halfak moved this task from Unsorted to Ideas on the Machine-Learning-Team board.May 16 2016, 3:58 PM

Halfak moved this task from Ideas to New development on the Machine-Learning-Team board.

He7d3r subscribed.May 20 2016, 2:11 AM

Halfak added a subtask: T136875: [Spike] Implement & test dependent tasks in Celery.Jun 6 2016, 3:45 PM

• schana subscribed.Jun 6 2016, 4:24 PM

Ladsgroup closed subtask T136875: [Spike] Implement & test dependent tasks in Celery as Resolved.Jun 7 2016, 9:48 PM

Halfak edited projects, added Machine-Learning-Team (Active Tasks); removed Machine-Learning-Team.Jun 28 2016, 7:13 PM

Made some substantial progress here: https://github.com/wiki-ai/ores/pull/144 It required some deeper refactoring than expected and might result in a v3 spec for response structure. We'll see.

See notes here: https://etherpad.wikimedia.org/p/ores_refactor

Halfak added a parent task: T139408: [Epic] ORES refactor: Scoring structure.Jul 5 2016, 6:52 PM

I just switched to a new branch with the commits split up. See https://github.com/wiki-ai/ores/pull/155

Halfak moved this task from Review to Completed on the Machine-Learning-Team (Active Tasks) board.Jul 26 2016, 4:40 PM

Halfak closed this task as Resolved.Aug 2 2016, 9:47 PM

Score multiple models with the same cached dependenciesClosed, ResolvedPublicActions

Description

Related ObjectsSearch...

Event Timeline

Score multiple models with the same cached dependencies
Closed, ResolvedPublic
Actions

Related Objects
Search...