In normal operation, the extract utility will only store the final, calculated features. Some users might be interested in having the raw extracted data, so we'll add a flag to the utility which includes these in stored caches.
|Open||None||T209611 [Epic] Make ORES scores for wikidata available as a dump|
|Open||None||T211069 Decide whether we will include raw features|
|Open||None||T214723 Modify revscoring extract utility to include root datasources|
Looks good. Nice and straightforward :) --roots-only doesn't really make sense since there's no circumstance where we'd include leaf dependencies and roots together. Maybe --just-roots or maybe --extract-roots would make more sense.
Root datasources should never include a generator. In fact, no datasources should include a generator. They should always evaluate to a re-usable data type. I think that maybe dig() itself is returning a generator and you should list(dig(...)) instead.
Why would you want to extract features and roots using revscoring extract? You can always extract the features from the roots later. Is it intended for convenience?