Page MenuHomePhabricator

[Investigate] Use PMML for prediction model serialization
Open, LowPublic

Description

https://en.wikipedia.org/wiki/Predictive_Model_Markup_Language

Storing our models in this standard might facilitate using other evaluation backends.

Event Timeline

Halfak triaged this task as Medium priority.Aug 16 2017, 4:58 PM
Halfak moved this task from Unsorted to Ideas on the Machine-Learning-Team board.

The toolchain isn't very mature. Writing PMML relies on a Java binary, which is acceptable for a compilation pipeline. However, there doesn't seem to be a way to read PMML into our evaluator, so we would have to implement from scratch. It's probably not a big deal to serialize sklearn internals, at most a dozen simple variables to map, and our additional chunks like model_info could be stored under custom keys. This might be a good step towards helping mainstream this sort of metadata and encourage an emerging industry best practice of making this data available to consumers.

Since there's a nontrivial (2 days?) amount of work involved, we would have to look at the tradeoffs, benefits being mostly theoretical for now. One tangible benefit is that we won't have to rebuild models for minor or perhaps even major updates to sklearn, but I can't think of any others. We don't have plans to use more general software for either training or evaluation, so our current serialization is mostly transparent except for at upgrade time.

awight lowered the priority of this task from Medium to Low.Aug 9 2018, 10:15 PM

Defining fields and mappings should be an interesting exercise.

I'm going to deprioritize this although it would be fun.