https://en.wikipedia.org/wiki/Predictive_Model_Markup_Language
Storing our models in this standard might facilitate using other evaluation backends.
https://en.wikipedia.org/wiki/Predictive_Model_Markup_Language
Storing our models in this standard might facilitate using other evaluation backends.
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | None | T170650 [Investigate] Hadoop integration for ORES training | |||
Declined | None | T201047 Use Joblib for ORES model serialization | |||
Open | None | T173244 [Investigate] Use PMML for prediction model serialization |
The toolchain isn't very mature. Writing PMML relies on a Java binary, which is acceptable for a compilation pipeline. However, there doesn't seem to be a way to read PMML into our evaluator, so we would have to implement from scratch. It's probably not a big deal to serialize sklearn internals, at most a dozen simple variables to map, and our additional chunks like model_info could be stored under custom keys. This might be a good step towards helping mainstream this sort of metadata and encourage an emerging industry best practice of making this data available to consumers.
Since there's a nontrivial (2 days?) amount of work involved, we would have to look at the tradeoffs, benefits being mostly theoretical for now. One tangible benefit is that we won't have to rebuild models for minor or perhaps even major updates to sklearn, but I can't think of any others. We don't have plans to use more general software for either training or evaluation, so our current serialization is mostly transparent except for at upgrade time.
Defining fields and mappings should be an interesting exercise.
I'm going to deprioritize this although it would be fun.