Page MenuHomePhabricator

Modify revscoring extract utility to include root datasources
Open, LowestPublic

Description

In normal operation, the extract utility will only store the final, calculated features. Some users might be interested in having the raw extracted data, so we'll add a flag to the utility which includes these in stored caches.

Event Timeline

awight created this task.Jan 25 2019, 8:17 PM
Restricted Application added a project: artificial-intelligence. · View Herald TranscriptJan 25 2019, 8:17 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Looks good. Nice and straightforward :) --roots-only doesn't really make sense since there's no circumstance where we'd include leaf dependencies and roots together. Maybe --just-roots or maybe --extract-roots would make more sense.

Root datasources should never include a generator. In fact, no datasources should include a generator. They should always evaluate to a re-usable data type. I think that maybe dig() itself is returning a generator and you should list(dig(...)) instead.

Why would you want to extract features and roots using revscoring extract? You can always extract the features from the roots later. Is it intended for convenience?

Harej triaged this task as Lowest priority.Mar 25 2019, 4:42 PM
Harej removed awight as the assignee of this task.