machine learning - How to save large sklearn RandomForestRegressor model for inference

Question

Welcome To Ask or Share your Answers For Others

machine learning - How to save large sklearn RandomForestRegressor model for inference

asked Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

machine learning - How to save large sklearn RandomForestRegressor model for inference

I trained a Sklearn RandomForestRegressor model on 19GB of training data. I would like to save it to disk in order to use it later for inference. As have been recomended in another stackoverflow questions, I tried the following:

Pickle

pickle.dump(model, open(filename, 'wb'))

Model was saved successfully. It's size on disk was 1.9 GB.

loaded_model = pickle.load(open(filename, 'rb'))

Loading of the model resulted in MemorError (despite 16 GB RAM)

cPickle - the same result as Pickle
Joblib

joblib.dump(est, 'random_forest.joblib' compress=3)

It also ends with the MemoryError while loading the file.

Klepto

d = klepto.archives.dir_archive('sklearn_models', cached=True, serialized=True)
d['sklearn_random_forest'] = est
d.dump()

Arhcive is created, but when I want to load it using the following code, I get the KeyError: 'sklearn_random_forest'

d = klepto.archives.dir_archive('sklearn_models', cached=True, serialized=True)
d.load(model_params)
est = d[model_params]

I tried saving dictionary object using the same code, and it worked, so the code is correct. Apparently Klepto cannot persist sklearn models. I played with cached and serialized parameters and it didn't help.

Any hints on how to handle this would be very appreciated. Is it possible to save the model in JSON, XML, maybe HDFS, or maybe other formats?

question from:https://stackoverflow.com/questions/65834680/how-to-save-large-sklearn-randomforestregressor-model-for-inference

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-06T19:36:02+0000

Try using joblib.dump()

In this method, you can use the param "compress". This param takes in Integer values between 0 and 9, the higher the value the more compressed your file gets. Ideally, a compress value of 3 would suffice.

The only downside is that the higher the compress value slower the write/read speed!

Categories

machine learning - How to save large sklearn RandomForestRegressor model for inference

machine learning - How to save large sklearn RandomForestRegressor model for inference

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags