[Investigate] Use PMML for prediction model serialization
Open, LowPublic
Actions

Assigned To

None

Authored By

	awight
	Aug 13 2017, 1:45 PM

Description

https://en.wikipedia.org/wiki/Predictive_Model_Markup_Language

Storing our models in this standard might facilitate using other evaluation backends.

Related Objects
Search...

Status	Assigned	Task
Resolved	None	T170650 [Investigate] Hadoop integration for ORES training
Declined	None	T201047 Use Joblib for ORES model serialization
Open	None	T173244 [Investigate] Use PMML for prediction model serialization

Event Timeline

awight created this task.Aug 13 2017, 1:45 PM

• Nuria moved this task from Incoming to Radar on the Analytics board.Aug 14 2017, 3:45 PM

Halfak triaged this task as Medium priority.Aug 16 2017, 4:58 PM

Halfak moved this task from Unsorted to Ideas on the Machine-Learning-Team board.

awight added a parent task: T201047: Use Joblib for ORES model serialization.Aug 9 2018, 6:13 PM

The toolchain isn't very mature. Writing PMML relies on a Java binary, which is acceptable for a compilation pipeline. However, there doesn't seem to be a way to read PMML into our evaluator, so we would have to implement from scratch. It's probably not a big deal to serialize sklearn internals, at most a dozen simple variables to map, and our additional chunks like model_info could be stored under custom keys. This might be a good step towards helping mainstream this sort of metadata and encourage an emerging industry best practice of making this data available to consumers.

Since there's a nontrivial (2 days?) amount of work involved, we would have to look at the tradeoffs, benefits being mostly theoretical for now. One tangible benefit is that we won't have to rebuild models for minor or perhaps even major updates to sklearn, but I can't think of any others. We don't have plans to use more general software for either training or evaluation, so our current serialization is mostly transparent except for at upgrade time.

awight edited projects, added Machine-Learning-Team (Active Tasks); removed Machine-Learning-Team.Aug 9 2018, 6:18 PM

awight moved this task from Parked to Review on the Machine-Learning-Team (Active Tasks) board.

awight claimed this task.Aug 9 2018, 6:42 PM