Skip to content

How to Convert Jupyter Notebooks to Production-Ready Python Code

The Problem

I built a machine learning model in Jupyter Notebook. It worked perfectly. Then my manager said: “Great, deploy it to production by Friday.”

I stared at my notebook. Twenty-three cells. Global variables everywhere. Hardcoded file paths. No error handling. How do I turn this into something that runs reliably in production?

notebook.ipynb (cell 1)
# My "working" prototype
import pandas as pd
df = pd.read_csv('/Users/me/Desktop/project/data.csv') # Hardcoded path
df = df.dropna() # Silent data loss
X = df[['feature1', 'feature2']]
y = df['target']

This is the gap between data science and engineering. Notebooks excel at exploration, but production demands reproducibility, testing, and scalability. Here’s how I bridged that gap.

First Attempt: Manual Copy-Paste

I tried copying code cell by cell into a Python file.

model.py (my first attempt)
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
df = pd.read_csv('/Users/me/Desktop/project/data.csv')
df = df.dropna()
X = df[['feature1', 'feature2']]
y = df['target']
model = RandomForestClassifier()
model.fit(X, y)

Problems showed up immediately:

  • The hardcoded path failed on the server
  • No logging to track what happened
  • No error handling when data was missing
  • No way to run with different parameters

I needed a better approach.

Step 1: Convert with nbconvert

nbconvert is Jupyter’s built-in conversion tool. It extracts code from notebooks into Python scripts.

Terminal
# Convert single notebook
jupyter nbconvert --to script prototype.ipynb
# Clear outputs before conversion (cleaner result)
jupyter nbconvert --ClearOutputPreprocessor.enabled=True --inplace prototype.ipynb
jupyter nbconvert --to script prototype.ipynb

This gave me a .py file with all my code. But it was still messy - comments mixed with code, no structure.

For automation, I switched to programmatic conversion with nbclient:

convert_notebook.py
import nbformat
from nbclient import NotebookClient
from nbclient.exceptions import CellExecutionError
# Load the notebook
with open('prototype.ipynb', 'r') as f:
nb = nbformat.read(f, as_version=4)
# Execute notebook programmatically
client = NotebookClient(
nb,
timeout=600,
kernel_name='python3',
resources={'metadata': {'path': 'notebooks/'}}
)
try:
client.execute()
except CellExecutionError as e:
print(f'Error executing notebook: {e}')
raise
finally:
nbformat.write(nb, 'executed_notebook.ipynb')

This approach lets me run notebooks in pipelines and catch errors programmatically.

Step 2: Parameterize with Papermill

My notebook had hardcoded parameters scattered everywhere. I needed to pass different values for different runs.

Papermill solves this. First, I added a “parameters” cell to my notebook:

prototype.ipynb (parameters cell)
# Tag this cell as "parameters" in Jupyter
alpha = 0.5
l1_ratio = 0.1
n_estimators = 100
data_path = "data/default.csv"

Then I could execute with different parameters:

Terminal
# Execute with custom parameters
papermill input.ipynb output.ipynb \
-p alpha 0.6 \
-p l1_ratio 0.1 \
-p data_path "data/production.csv"

Or programmatically:

run_parameterized.py
import papermill as pm
pm.execute_notebook(
'templates/model_training.ipynb',
'outputs/training_run_001.ipynb',
parameters=dict(
alpha=0.6,
l1_ratio=0.1,
data_path='data/production.csv'
)
)

Now I can run the same notebook with different configurations for dev, staging, and production.

Step 3: Refactor for Production Quality

The converted script still had notebook-style code. I needed proper structure.

Before (Notebook Style)

before_refactor.py
# Messy notebook-style code
import pandas as pd
df = pd.read_csv('data.csv')
df = df.dropna()
X = df[['feature1', 'feature2']]
y = df['target']
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier()
model.fit(X, y)

After (Production-Ready)

src/data_processor.py
import pandas as pd
from typing import Tuple
import logging
logger = logging.getLogger(__name__)
def load_and_preprocess_data(
filepath: str,
features: list[str],
target: str
) -> Tuple[pd.DataFrame, pd.Series]:
"""
Load and preprocess data for model training.
Args:
filepath: Path to the CSV data file
features: List of feature column names
target: Target column name
Returns:
Tuple of (features DataFrame, target Series)
Raises:
FileNotFoundError: If data file doesn't exist
ValueError: If required columns are missing
"""
try:
df = pd.read_csv(filepath)
except FileNotFoundError:
logger.error(f"Data file not found: {filepath}")
raise
required_columns = features + [target]
missing = set(required_columns) - set(df.columns)
if missing:
raise ValueError(f"Missing columns: {missing}")
df = df.dropna(subset=required_columns)
X = df[features]
y = df[target]
logger.info(f"Loaded {len(df)} samples with {len(features)} features")
return X, y
src/model.py
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score
import numpy as np
import logging
logger = logging.getLogger(__name__)
class ModelTrainer:
def __init__(self, n_estimators: int = 100, random_state: int = 42):
self.n_estimators = n_estimators
self.random_state = random_state
self.model = None
def train(self, X, y) -> dict:
"""Train model and return metrics."""
self.model = RandomForestClassifier(
n_estimators=self.n_estimators,
random_state=self.random_state
)
cv_scores = cross_val_score(self.model, X, y, cv=5)
self.model.fit(X, y)
metrics = {
'cv_mean': np.mean(cv_scores),
'cv_std': np.std(cv_scores)
}
logger.info(f"Model trained. CV accuracy: {metrics['cv_mean']:.3f} (+/- {metrics['cv_std']:.3f})")
return metrics
def predict(self, X):
"""Make predictions."""
if self.model is None:
raise RuntimeError("Model not trained. Call train() first.")
return self.model.predict(X)

Key changes I made:

  • Type hints for IDE support and error catching
  • Logging instead of print statements
  • Error handling with specific exceptions
  • Docstrings for documentation
  • Classes to encapsulate state

Step 4: Deploy with MLflow

For model versioning and deployment, I used MLflow. It tracks experiments, versions models, and serves predictions.

src/train_and_log.py
import mlflow
from mlflow.tracking import MlflowClient
import logging
logger = logging.getLogger(__name__)
def train_and_log_model(X, y, model_name: str = "production_model"):
"""Train model and log to MLflow."""
with mlflow.start_run():
# Log parameters
mlflow.log_param("n_estimators", 100)
mlflow.log_param("random_state", 42)
# Train model
trainer = ModelTrainer(n_estimators=100)
metrics = trainer.train(X, y)
# Log metrics
mlflow.log_metric("cv_accuracy", metrics['cv_mean'])
mlflow.log_metric("cv_std", metrics['cv_std'])
# Log and register model
mlflow.sklearn.log_model(
trainer.model,
"model",
registered_model_name=model_name
)
run_id = mlflow.active_run().info.run_id
logger.info(f"Model logged with run_id: {run_id}")
return run_id

Promoting to production:

src/promote_model.py
def promote_to_production(model_name: str, version: int):
"""Transition model version to Production stage."""
client = MlflowClient()
client.transition_model_version_stage(
name=model_name,
version=version,
stage="Production"
)
logger.info(f"Model {model_name} v{version} promoted to Production")

Serving the model:

Terminal
# Local serving
mlflow models serve \
-m "models:/production_model/Production" \
--host 0.0.0.0 --port 5000

Making predictions via REST API:

predict_client.py
import requests
url = "http://127.0.0.1:5000/invocations"
data = {
"dataframe_split": {
"columns": ["feature1", "feature2"],
"data": [[5.1, 3.5]]
}
}
response = requests.post(url, json=data)
predictions = response.json()
print(f"Predictions: {predictions}")

The Complete Workflow

I now follow this pipeline:

Jupyter Notebook (prototype)
|
v
nbconvert --to script
|
v
Refactor into modules
|
v
Add papermill parameters
|
v
Unit tests + CI/CD
|
v
MLflow Model Registry
|
v
Production Serve (REST API)

What I Learned

  1. Convert first, refactor second - nbconvert gives you a starting point, but you still need to restructure the code.

  2. Parameterize early - Papermill lets you run the same notebook with different configs without code changes.

  3. Structure matters - Separate data loading, processing, and model logic into distinct modules.

  4. Track everything - MLflow tracks parameters, metrics, and model versions. This saved me when I needed to reproduce a result from three months ago.

  5. Test before deploying - I write unit tests for each module before the code goes anywhere near production.

Summary

In this post, I showed how to transition Jupyter Notebook prototypes to production-ready Python code. The process involves: converting with nbconvert, parameterizing with papermill, refactoring into modular components, and deploying with MLflow. This workflow bridges the gap between exploration and production, giving you both flexibility and reliability.

Final Words + More Resources

My intention with this article was to help others share my knowledge and experience. If you want to contact me, you can contact by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!

Comments