HomeMLOps EngineerModel Packaging & Serialization

Model Packaging & Serialization

Master model formats including Pickle, Joblib, ONNX, SavedModel, and TorchScript. Learn to create production-ready artifacts with proper dependency management

📅 Tutorial 3 📊 Beginner

🎓 Complete all tutorials to earn your Free MLOps Engineer Certificate
Shareable on LinkedIn • Verified by AITutorials.site • No signup fee

🎯 The Serialization Challenge

You've trained an amazing model. It works perfectly in your Jupyter notebook. But how do you get it from development to production? How do you ensure that the model you trained last month still works today? How do you share it with teammates or deploy it across different servers?

The answer is model serialization - the process of converting your trained model into a format that can be saved, loaded, shared, and deployed. But choosing the wrong format can lead to:

⚠️ Common Problems Without Proper Serialization:

  • Models break when Python versions change
  • Can't deploy sklearn models in production Java services
  • Model file is 2GB because you saved training data accidentally
  • Model works on your laptop but fails on the server
  • "ModuleNotFoundError" when loading a model from 6 months ago

In this tutorial, we'll cover every major serialization format, when to use each one, and best practices for creating production-ready model packages.

📦 What is Model Serialization?

Definition: Model serialization is the process of converting a trained machine learning model (which exists in memory) into a format that can be saved to disk, transmitted over a network, and reconstructed later.

What Gets Serialized?

When you serialize a model, you're typically saving:

  • Model Architecture: The structure of your model (layers, neurons, connections)
  • Learned Parameters: Weights, biases, embeddings learned during training
  • Model Configuration: Hyperparameters, preprocessing settings
  • Metadata: Training date, framework version, performance metrics

Why Serialization Matters for MLOps

  1. Reproducibility: Load exact same model months later
  2. Deployment: Move models from training to production environments
  3. Versioning: Track and roll back model versions
  4. Sharing: Collaborate with team members
  5. Portability: Run models across different platforms/languages
  6. Efficiency: Skip expensive retraining

🥒 Pickle & Joblib: Python's Native Options

Pickle: Python's Built-in Serialization

Pickle is Python's standard serialization library. It can serialize almost any Python object, including ML models.

Basic Pickle Usage

import pickle
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris

# Train a model
X, y = load_iris(return_X_y=True)
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X, y)

# Save model with pickle
with open('model.pkl', 'wb') as f:
    pickle.dump(model, f)

# Load model
with open('model.pkl', 'rb') as f:
    loaded_model = pickle.load(f)

# Test loaded model
predictions = loaded_model.predict(X[:5])
print(f"Predictions: {predictions}")
print(f"Accuracy: {loaded_model.score(X, y)}")

Joblib: Better for Large NumPy Arrays

Joblib is built on top of Pickle but optimized for large NumPy arrays (common in ML). It's the recommended choice for scikit-learn models.

import joblib
from sklearn.ensemble import GradientBoostingClassifier

# Train model
model = GradientBoostingClassifier(n_estimators=200)
model.fit(X, y)

# Save with joblib (more efficient than pickle for sklearn)
joblib.dump(model, 'model.joblib')

# Load model
loaded_model = joblib.load('model.joblib')

# Compression for smaller files
joblib.dump(model, 'model_compressed.joblib', compress=3)  # compression level 0-9
print(f"Model saved and loaded successfully!")

Pickle vs Joblib Comparison

Feature Pickle Joblib
Speed for Large Arrays Slower ✅ Faster (10x+)
File Size Larger ✅ Smaller (with compression)
Scikit-learn Official Supported ✅ Recommended
Standard Library ✅ Built-in External package
Best For Small models, general objects ✅ Large sklearn/NumPy models

⚠️ Critical Security Warning:

Never unpickle data from untrusted sources! Pickle can execute arbitrary code during deserialization. Only load pickle/joblib files you created or from trusted sources. For production APIs, use ONNX or model-serving frameworks instead.

Limitations of Pickle/Joblib

  • Python-only: Can't load in Java, C++, JavaScript
  • Version dependent: Python/library version changes can break models
  • Security risks: Arbitrary code execution vulnerability
  • Not optimized: Slower inference than compiled formats
  • Hard to inspect: Binary format, can't view model structure easily

🌐 ONNX: Cross-Framework Interoperability

ONNX (Open Neural Network Exchange) is an open format to represent ML models. It enables models to be transferred between different frameworks (PyTorch → TensorFlow, sklearn → ONNX Runtime).

💡 Key Advantage: Train in PyTorch, deploy with ONNX Runtime (10x faster inference!). Use in Python, C++, Java, JavaScript, C#, and more.

Converting Scikit-learn to ONNX

import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris
import numpy as np

# Train model
X, y = load_iris(return_X_y=True)
model = RandomForestClassifier(n_estimators=10, random_state=42)
model.fit(X, y)

# Define input type (4 features)
initial_type = [('float_input', FloatTensorType([None, 4]))]

# Convert to ONNX
onnx_model = convert_sklearn(model, initial_types=initial_type)

# Save ONNX model
with open("rf_model.onnx", "wb") as f:
    f.write(onnx_model.SerializeToString())

print("✅ Model converted to ONNX!")

# Load and run with ONNX Runtime
session = ort.InferenceSession("rf_model.onnx")

# Prepare input
input_name = session.get_inputs()[0].name
X_test = X[:5].astype(np.float32)

# Run inference
result = session.run(None, {input_name: X_test})
predictions = result[0]
probabilities = result[1]

print(f"Predictions: {predictions}")
print(f"Probabilities:\n{probabilities}")

Converting PyTorch to ONNX

import torch
import torch.nn as nn
import torch.onnx

# Define a simple PyTorch model
class SimpleNN(nn.Module):
    def __init__(self):
        super(SimpleNN, self).__init__()
        self.fc1 = nn.Linear(10, 50)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(50, 3)
    
    def forward(self, x):
        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)
        return x

# Create and train model (simplified)
model = SimpleNN()
model.eval()

# Create dummy input for export
dummy_input = torch.randn(1, 10)

# Export to ONNX
torch.onnx.export(
    model,                      # model being run
    dummy_input,                # model input (or a tuple for multiple inputs)
    "pytorch_model.onnx",       # where to save the model
    export_params=True,         # store the trained parameter weights
    opset_version=14,           # ONNX version
    do_constant_folding=True,   # optimization
    input_names=['input'],      # model's input names
    output_names=['output'],    # model's output names
    dynamic_axes={              # variable length axes
        'input': {0: 'batch_size'},
        'output': {0: 'batch_size'}
    }
)

print("✅ PyTorch model exported to ONNX!")

# Verify the model
import onnx
onnx_model = onnx.load("pytorch_model.onnx")
onnx.checker.check_model(onnx_model)
print("✅ ONNX model is valid!")

ONNX Advantages

🚀

Performance

ONNX Runtime provides 2-10x faster inference than native frameworks

🌍

Cross-Platform

Run on Windows, Linux, macOS, mobile, browsers, edge devices

🔄

Interoperability

Train in one framework, deploy in another without retraining

🛡️

Production-Ready

Used by Microsoft, AWS, Facebook in production systems

🧠 TensorFlow SavedModel Format

SavedModel is TensorFlow's recommended serialization format. It's a complete, self-contained package including the model architecture, weights, and computation graph.

Saving a Keras Model

import tensorflow as tf
from tensorflow import keras
import numpy as np

# Create a simple Keras model
model = keras.Sequential([
    keras.layers.Dense(64, activation='relu', input_shape=(20,)),
    keras.layers.Dropout(0.3),
    keras.layers.Dense(32, activation='relu'),
    keras.layers.Dense(10, activation='softmax')
])

model.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

# Train model (simplified)
X_train = np.random.randn(1000, 20)
y_train = np.random.randint(0, 10, 1000)
model.fit(X_train, y_train, epochs=5, verbose=0)

# Save as SavedModel (directory-based format)
model.save('my_model')  # Creates my_model/ directory

print("✅ Model saved in SavedModel format!")

# Load model
loaded_model = keras.models.load_model('my_model')

# Make predictions
X_test = np.random.randn(5, 20)
predictions = loaded_model.predict(X_test)
print(f"Predictions shape: {predictions.shape}")

# Save in HDF5 format (older, single file)
model.save('my_model.h5')  # Single .h5 file
loaded_h5 = keras.models.load_model('my_model.h5')

SavedModel vs HDF5

Feature SavedModel HDF5 (.h5)
TensorFlow Serving ✅ Fully supported Not directly supported
File Structure Directory with assets Single file
Custom Objects ✅ Better handling Requires custom_objects dict
TensorFlow.js ✅ Can convert Limited support
TFLite Conversion ✅ Recommended Possible but not preferred
Recommendation ✅ Use for production Legacy, use for compatibility

SavedModel Structure

my_model/
├── assets/              # Additional files (vocabulary, etc.)
├── variables/           # Model weights
│   ├── variables.data-00000-of-00001
│   └── variables.index
└── saved_model.pb       # Model architecture and metadata

✅ Best Practice: Always use SavedModel format for TensorFlow/Keras production deployments. It's the only format fully supported by TensorFlow Serving, TFLite, and TensorFlow.js.

🔥 PyTorch TorchScript

TorchScript is PyTorch's way to create serializable, optimizable models that can run in production environments without Python.

Two Ways to Create TorchScript

1. Tracing (Easier)

import torch
import torch.nn as nn

# Define model
class SimpleModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(10, 20)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(20, 5)
    
    def forward(self, x):
        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)
        return x

model = SimpleModel()
model.eval()

# Create example input
example_input = torch.randn(1, 10)

# Trace the model
traced_model = torch.jit.trace(model, example_input)

# Save traced model
traced_model.save('traced_model.pt')

# Load and use
loaded_model = torch.jit.load('traced_model.pt')
output = loaded_model(example_input)
print(f"Output shape: {output.shape}")

2. Scripting (More Flexible)

import torch
import torch.nn as nn

class ModelWithControlFlow(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(10, 20)
        self.fc2 = nn.Linear(20, 5)
    
    def forward(self, x):
        x = self.fc1(x)
        # Control flow: tracing would fail here!
        if x.sum() > 0:
            x = torch.relu(x)
        else:
            x = torch.sigmoid(x)
        x = self.fc2(x)
        return x

model = ModelWithControlFlow()

# Script the model (captures control flow)
scripted_model = torch.jit.script(model)

# Save scripted model
scripted_model.save('scripted_model.pt')

# Load in C++ (no Python needed!)
# Can also load in Python
loaded_model = torch.jit.load('scripted_model.pt')
test_input = torch.randn(1, 10)
output = loaded_model(test_input)
print(f"Output: {output}")

Tracing vs Scripting

Aspect Tracing Scripting
Usage torch.jit.trace() torch.jit.script()
How It Works Records operations on example input Analyzes Python code directly
Control Flow ❌ Doesn't capture if/for/while ✅ Captures control flow
Ease of Use ✅ Simpler Requires compatible Python
Best For Feedforward networks RNNs, complex logic

💡 When to Use TorchScript:

  • Deploying PyTorch models in C++ environments
  • Mobile deployment (Android/iOS)
  • Edge devices without Python
  • Optimizing inference performance
  • Protecting model IP (harder to reverse engineer)

📋 Creating Complete Model Artifacts

A production model isn't just a serialized file. It's a complete package with everything needed to use the model correctly.

Essential Components of a Model Artifact

  1. Model File: Serialized weights and architecture
  2. Preprocessing Code: How to transform inputs
  3. Metadata: Version, training date, metrics, hyperparameters
  4. Dependencies: Python packages and versions required
  5. Schema: Expected input/output format
  6. Example Usage: Code snippets showing how to use it
  7. Model Card: Documentation about the model

Complete Packaging Example

"""
Complete Model Packaging for Production
"""
import joblib
import json
import yaml
from datetime import datetime
from pathlib import Path
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import load_iris

class ModelPackager:
    """Package ML models with all necessary artifacts"""
    
    def __init__(self, model_dir='model_package'):
        self.model_dir = Path(model_dir)
        self.model_dir.mkdir(exist_ok=True)
    
    def save_model(self, model, model_name='model'):
        """Save the trained model"""
        model_path = self.model_dir / f'{model_name}.joblib'
        joblib.dump(model, model_path, compress=3)
        return str(model_path)
    
    def save_preprocessor(self, preprocessor, name='preprocessor'):
        """Save preprocessing pipeline"""
        prep_path = self.model_dir / f'{name}.joblib'
        joblib.dump(preprocessor, prep_path)
        return str(prep_path)
    
    def save_metadata(self, metadata):
        """Save model metadata as JSON"""
        meta_path = self.model_dir / 'metadata.json'
        with open(meta_path, 'w') as f:
            json.dump(metadata, f, indent=2, default=str)
        return str(meta_path)
    
    def save_schema(self, input_schema, output_schema):
        """Save input/output schema"""
        schema = {
            'input': input_schema,
            'output': output_schema
        }
        schema_path = self.model_dir / 'schema.yaml'
        with open(schema_path, 'w') as f:
            yaml.dump(schema, f)
        return str(schema_path)
    
    def save_example(self, example_input, example_output):
        """Save example input/output"""
        example = {
            'input': example_input.tolist() if hasattr(example_input, 'tolist') else example_input,
            'output': example_output.tolist() if hasattr(example_output, 'tolist') else example_output
        }
        example_path = self.model_dir / 'example.json'
        with open(example_path, 'w') as f:
            json.dump(example, f, indent=2)
        return str(example_path)
    
    def create_requirements(self):
        """Generate requirements.txt"""
        # In practice, use pipreqs or similar
        requirements = [
            'scikit-learn>=1.0.0',
            'numpy>=1.21.0',
            'joblib>=1.1.0'
        ]
        req_path = self.model_dir / 'requirements.txt'
        with open(req_path, 'w') as f:
            f.write('\n'.join(requirements))
        return str(req_path)
    
    def create_model_card(self, card_info):
        """Create model card documentation"""
        card_path = self.model_dir / 'MODEL_CARD.md'
        with open(card_path, 'w') as f:
            f.write(f"# Model Card: {card_info['name']}\n\n")
            f.write(f"**Version:** {card_info['version']}\n")
            f.write(f"**Created:** {card_info['created_date']}\n\n")
            f.write(f"## Description\n{card_info['description']}\n\n")
            f.write(f"## Performance\n")
            for metric, value in card_info['metrics'].items():
                f.write(f"- {metric}: {value}\n")
            f.write(f"\n## Usage\n```python\n{card_info['usage_example']}\n```\n")
        return str(card_path)

# ===== USE THE PACKAGER =====

# 1. Train model and preprocessor
X, y = load_iris(return_X_y=True)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_scaled, y)

# 2. Create packager
packager = ModelPackager('iris_classifier_v1')

# 3. Save model and preprocessor
packager.save_model(model, 'iris_classifier')
packager.save_preprocessor(scaler, 'scaler')

# 4. Create metadata
metadata = {
    'model_name': 'iris_classifier',
    'version': '1.0.0',
    'framework': 'scikit-learn',
    'algorithm': 'RandomForestClassifier',
    'training_date': datetime.now().isoformat(),
    'hyperparameters': {
        'n_estimators': 100,
        'random_state': 42
    },
    'metrics': {
        'train_accuracy': float(model.score(X_scaled, y)),
        'n_features': X.shape[1],
        'n_classes': len(np.unique(y))
    },
    'feature_names': ['sepal_length', 'sepal_width', 'petal_length', 'petal_width'],
    'target_names': ['setosa', 'versicolor', 'virginica']
}
packager.save_metadata(metadata)

# 5. Save schema
input_schema = {
    'type': 'array',
    'shape': [4],
    'features': ['sepal_length', 'sepal_width', 'petal_length', 'petal_width'],
    'dtype': 'float32'
}
output_schema = {
    'type': 'integer',
    'range': [0, 2],
    'labels': ['setosa', 'versicolor', 'virginica']
}
packager.save_schema(input_schema, output_schema)

# 6. Save example
example_input = X[:1]
example_output = model.predict(scaler.transform(example_input))
packager.save_example(example_input, example_output)

# 7. Create requirements.txt
packager.create_requirements()

# 8. Create model card
card_info = {
    'name': 'Iris Species Classifier',
    'version': '1.0.0',
    'created_date': datetime.now().strftime('%Y-%m-%d'),
    'description': 'Random Forest classifier for predicting iris species from flower measurements.',
    'metrics': {
        'Training Accuracy': '100%',
        'Features': 4,
        'Classes': 3
    },
    'usage_example': '''
import joblib
model = joblib.load('iris_classifier.joblib')
scaler = joblib.load('scaler.joblib')
prediction = model.predict(scaler.transform([[5.1, 3.5, 1.4, 0.2]]))
'''
}
packager.create_model_card(card_info)

print("✅ Complete model package created!")
print(f"📦 Package location: {packager.model_dir}")

Final Package Structure

iris_classifier_v1/
├── iris_classifier.joblib    # Trained model
├── scaler.joblib              # Preprocessing pipeline
├── metadata.json              # Training metadata
├── schema.yaml                # Input/output schema
├── example.json               # Example inference
├── requirements.txt           # Dependencies
└── MODEL_CARD.md             # Documentation

📦 Dependency Management

One of the biggest causes of "it worked on my machine" problems is dependency mismatch. Proper dependency management is critical for reproducibility.

Capturing Exact Dependencies

# Don't just use pip freeze (captures everything)
pip freeze > requirements.txt  # ❌ Includes all packages

# Better: Use pipreqs to detect actual imports
pip install pipreqs
pipreqs /path/to/project --force  # ✅ Only used packages

# Best: Use poetry or pipenv for proper dependency resolution
pip install poetry
poetry init
poetry add scikit-learn numpy pandas
poetry export -f requirements.txt --output requirements.txt

Version Pinning Strategy

# ❌ Too loose - can break with updates
scikit-learn
numpy

# ❌ Too strict - prevents security updates
scikit-learn==1.0.2
numpy==1.21.4

# ✅ Just right - compatible range
scikit-learn>=1.0.0,<2.0.0
numpy>=1.21.0,<2.0.0
python-dateutil>=2.8.0,<3.0.0

Using MLflow for Dependency Tracking

import mlflow
import mlflow.sklearn
from sklearn.ensemble import RandomForestClassifier

mlflow.set_experiment("iris-classifier")

with mlflow.start_run():
    # Train model
    model = RandomForestClassifier()
    model.fit(X_train, y_train)
    
    # MLflow automatically logs:
    # - Python version
    # - Library versions (sklearn, numpy, etc.)
    # - Conda environment
    # - pip requirements
    mlflow.sklearn.log_model(
        model,
        "model",
        conda_env={
            'dependencies': [
                'python=3.9',
                'pip',
                {
                    'pip': [
                        'scikit-learn==1.0.2',
                        'numpy==1.21.4'
                    ]
                }
            ]
        }
    )

# Later, reload with exact same environment
model_uri = "runs://model"
loaded_model = mlflow.sklearn.load_model(model_uri)

⚠️ Version Compatibility Issues to Watch:

  • Scikit-learn 0.24 → 1.0: Major API changes
  • Python 3.7 → 3.10: Pickle protocol differences
  • NumPy 1.19 → 1.20: Type system changes
  • TensorFlow 1.x → 2.x: Complete rewrite

✅ Serialization Best Practices

1. Choose the Right Format

Use Case Recommended Format
Scikit-learn in Python production Joblib
Cross-framework/language deployment ONNX
TensorFlow Serving SavedModel
PyTorch in C++/mobile TorchScript
Quick prototyping Pickle
High-performance inference ONNX + ONNX Runtime

2. Always Test Deserialization

def test_model_serialization(model, X_test):
    """Verify model works after save/load"""
    import tempfile
    import joblib
    
    # Get predictions before saving
    predictions_before = model.predict(X_test)
    
    # Save and load
    with tempfile.NamedTemporaryFile(suffix='.joblib', delete=False) as f:
        joblib.dump(model, f.name)
        loaded_model = joblib.load(f.name)
    
    # Get predictions after loading
    predictions_after = loaded_model.predict(X_test)
    
    # Verify they match
    assert np.allclose(predictions_before, predictions_after), \
        "Predictions changed after serialization!"
    
    print("✅ Serialization test passed!")

# Run test
test_model_serialization(model, X_test)

3. Version Everything

import joblib
import json
from datetime import datetime
import sys
import sklearn

def save_model_with_metadata(model, filepath):
    """Save model with complete metadata"""
    metadata = {
        'timestamp': datetime.now().isoformat(),
        'python_version': sys.version,
        'sklearn_version': sklearn.__version__,
        'model_type': type(model).__name__,
        'model_params': model.get_params()
    }
    
    # Save model
    joblib.dump(model, filepath)
    
    # Save metadata alongside
    meta_path = filepath.replace('.joblib', '_metadata.json')
    with open(meta_path, 'w') as f:
        json.dump(metadata, f, indent=2)
    
    print(f"✅ Model and metadata saved to {filepath}")

4. Use Model Registries

For production environments, use a model registry like MLflow, DVC, or cloud-native solutions:

  • MLflow Model Registry: Track versions, stage transitions, lineage
  • DVC: Version control for large model files with Git
  • AWS SageMaker Model Registry: Integrated with SageMaker deployment
  • Azure ML Model Registry: Enterprise model management

5. Security Checklist

  • ✅ Never load pickle/joblib files from untrusted sources
  • ✅ Verify file integrity with checksums (SHA256)
  • ✅ Use ONNX for serving to external users (no code execution)
  • ✅ Scan model files for malware in CI/CD pipelines
  • ✅ Implement access controls on model storage
  • ✅ Log all model downloads/loads for audit trails

🎯 Summary

You've learned how to package and serialize ML models for production:

🥒

Pickle/Joblib

Python-only, easy to use, best for scikit-learn models in Python environments

🌐

ONNX

Cross-framework and cross-language, optimized inference, production-grade

🧠

SavedModel

TensorFlow's standard, works with TF Serving, TFLite, and TF.js

🔥

TorchScript

PyTorch production format, C++ deployment, mobile-ready

Key Takeaways

  1. Choose serialization format based on deployment target
  2. Always package models with metadata, schema, and dependencies
  3. Test deserialization before deploying to production
  4. Use model registries for version control and lineage
  5. Never load untrusted pickle files (security risk)
  6. Pin dependency versions for reproducibility

🚀 Next Steps:

Now that you can package models, the next tutorial will teach you how to build production-ready APIs around them using FastAPI. You'll learn request validation, async predictions, health checks, and rate limiting.

Test Your Knowledge

Q1: What is the main advantage of Joblib over Pickle for scikit-learn models?

It's more secure
It's much faster for large NumPy arrays and can compress files
It works in multiple languages
It doesn't require external libraries

Q2: Why is ONNX particularly useful for production ML systems?

It's the easiest format to use
It's the smallest file size
It enables cross-framework deployment and provides optimized inference across multiple languages
It's only for Python applications

Q3: What's the recommended TensorFlow format for production deployments?

Pickle
Checkpoint files
HDF5 (.h5)
SavedModel format

Q4: When should you use torch.jit.script() instead of torch.jit.trace()?

For all models
When your model has control flow (if/for/while statements)
When you want faster performance
For mobile deployment only

Q5: What's a critical security concern with Pickle/Joblib serialization?

It can execute arbitrary code during deserialization, so never load files from untrusted sources
The files are too large
It doesn't work with encryption
It's slower than other formats