Our dependency solving and extraction framework seems to be exemplary, and would be a valuable contribution to upstream work. This could take a few forms, the current thoughts are:
- Express extraction as a scikit-learn transformer and pipeline step.
- Keep all of our customizability wrt. context, feature injection, cache, config.
- Decouple from scoring.model
- Decouple from rev_id -> MediaWiki, the input and extractor are generalized.
- Include an example that does something simple with e.g. OSM.
Draft,
https://github.com/wiki-ai/revscoring/compare/sklearn_plugin?expand=1