How to Monitor Your Machine Learning Models with Neptune (Step by Step)
We’re building a model monitoring system that really tracks the performance of your machine learning models — not just fluff metrics you see thrown around in tutorials.
Prerequisites
- Python 3.11+
- Neptune Client 0.15.0+
- Pandas 1.5.0+
- Scikit-learn 1.1.0+
Step 1: Setting Up Your Environment
To get started, you need to install the necessary packages. If you don’t have Neptune installed yet, you’re in for a treat. Seriously, it’s a must-have for monitoring models. Here’s how to set it up:
pip install neptune-client pandas scikit-learn
Why does this matter? Because if you try to run any Neptune code without the client, your script will crash faster than my first attempt at deploying a model — and trust me, that was a disaster.
Step 2: Initialize Neptune in Your Script
Now, let’s initialize Neptune. This is where you set up your project. You’ll need a Neptune account for this, which you can create for free. Here’s the code:
import neptune.new as neptune
run = neptune.init(
project='your_workspace/your_project',
api_token='YOUR_API_TOKEN' # Get it from https://app.neptune.ai
)
Make sure to replace ‘your_workspace’ and ‘your_project’ with your actual Neptune workspace and project names. If you forget the API token, you’ll just stare at a blank screen, which is a real confidence booster. Seriously, double-check that token.
Step 3: Log Model Performance Metrics
Next, you need to log some metrics. This is crucial for tracking how well your model performs over time. Let’s assume you’re working with a simple model from Scikit-learn:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
# Load dataset
data = load_iris()
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.2, random_state=42)
# Train model
model = RandomForestClassifier()
model.fit(X_train, y_train)
# Make predictions
predictions = model.predict(X_test)
# Calculate accuracy
accuracy = accuracy_score(y_test, predictions)
# Log the metric
run['accuracy'] = accuracy
run['model/parameters'] = {'n_estimators': model.n_estimators}
run['model/feature_importance'] = model.feature_importances_
run.stop()
Logging these metrics helps you spot trends that might indicate your model is degrading. Like when your friend insists pineapple belongs on pizza, you need to track that madness.
Step 4: Monitor for Data Drift
Data drift is a common issue that can seriously impact your model’s performance. You need to check if the statistical properties of your input data change over time. Here’s a basic way to monitor drift:
import numpy as np
# Log original dataset statistics
mean_original = np.mean(X_train, axis=0)
std_original = np.std(X_train, axis=0)
run['data/original_mean'] = mean_original
run['data/original_std'] = std_original
# Run your prediction again after some time
new_data = ... # Load new data
mean_new = np.mean(new_data, axis=0)
std_new = np.std(new_data, axis=0)
# Check for drift
if np.abs(mean_new - mean_original).max() > 0.1: # Example threshold
run['data/drift'] = 'Detected'
else:
run['data/drift'] = 'Not Detected'
Monitor for drift, so you don’t get blindsided. Like that time I didn’t check the data before a big presentation and ended up talking about random stuff. That was fun.
Step 5: Visualize Your Metrics
Visualization helps you understand trends quickly. Neptune provides built-in tools for visualizing logged metrics. You can access your Neptune dashboard and see how the accuracy changes over different runs. But, if you want to create custom plots, you can do it using Matplotlib:
import matplotlib.pyplot as plt
# Example of plotting model accuracy
accuracy_history = run['accuracy'].fetch_values()
plt.plot(accuracy_history)
plt.title('Model Accuracy Over Time')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.show()
Seriously, visualizations are great. They let you brag about your model performance to your non-tech friends, even if you don’t fully understand the metrics yourself.
The Gotchas
- Overfitting: Don’t track metrics just on training data. Always validate on unseen data.
- API Limits: Neptune has limits on API calls. If you exceed them, your logs won’t show. Check their API Limits.
- Version Control: If you don’t track which model version you’re monitoring, you’ll lose context. Make a habit of logging every model version.
- Monitoring Latency: If you run your monitoring in production, be aware of additional latency introduced. Optimize your logging process to avoid slowdown.
- Drift Thresholds: Setting your drift detection thresholds too tight can lead to alerts that desensitize your team. Find the sweet spot.
Full Code
import neptune.new as neptune
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt
# Initialize Neptune
run = neptune.init(
project='your_workspace/your_project',
api_token='YOUR_API_TOKEN'
)
# Load dataset
data = load_iris()
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.2, random_state=42)
# Train model
model = RandomForestClassifier()
model.fit(X_train, y_train)
# Make predictions
predictions = model.predict(X_test)
# Calculate accuracy
accuracy = accuracy_score(y_test, predictions)
# Log the metric
run['accuracy'] = accuracy
run['model/parameters'] = {'n_estimators': model.n_estimators}
run['model/feature_importance'] = model.feature_importances_
# Log original dataset statistics for drift analysis
mean_original = np.mean(X_train, axis=0)
std_original = np.std(X_train, axis=0)
run['data/original_mean'] = mean_original
run['data/original_std'] = std_original
# Example new data
new_data = ... # Load new data
mean_new = np.mean(new_data, axis=0)
std_new = np.std(new_data, axis=0)
# Check for drift
if np.abs(mean_new - mean_original).max() > 0.1: # Example threshold
run['data/drift'] = 'Detected'
else:
run['data/drift'] = 'Not Detected'
# Plot accuracy
accuracy_history = run['accuracy'].fetch_values()
plt.plot(accuracy_history)
plt.title('Model Accuracy Over Time')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.show()
run.stop()
What’s Next?
Now that you have monitoring set up, consider implementing alerts for significant model degradation. Tools like Slack can be integrated to send notifications. This keeps your team in the loop and ready to take action.
FAQ
- How often should I log metrics? Log them after every training cycle or when you implement changes. It helps track performance effectively.
- Can I monitor multiple models? Yes, you can create separate runs in Neptune for each model, allowing you to track them independently.
- What data should I monitor? Focus on accuracy, precision, recall, and data drift. These metrics give a good picture of model health.
Data Sources
Last updated May 15, 2026. Data sourced from official docs and community benchmarks.
🕒 Published: