Home → Machine Learning → Customer Churn Prediction Project

šŸ“Š Customer Churn Prediction

Build a classification system to predict which customers will leave and identify key retention factors

šŸ“Š Advanced ā±ļø 6 hours šŸ’» Python + XGBoost šŸŽÆ Classification Project

šŸŽÆ Project Overview

Customer churn (when customers stop doing business with a company) costs businesses billions annually. In this project, you'll build a machine learning system to predict which customers are likely to churn, allowing companies to take preventive action.

Real-World Business Impact

  • Cost Savings: Acquiring new customers costs 5-25x more than retaining existing ones
  • Revenue Protection: Reducing churn by 5% can increase profits by 25-95%
  • Targeted Interventions: Focus retention efforts on high-risk customers
  • Actionable Insights: Identify key factors driving customer departures

What You'll Build

  • Imbalanced Classification System: Handle typical churn rates of 10-30%
  • Multiple ML Models: Compare Logistic Regression, Random Forest, and XGBoost
  • Feature Importance Analysis: Identify top churn drivers
  • Business Metrics: Calculate customer lifetime value and retention ROI
  • Risk Scoring: Assign churn probability to each customer
  • Intervention Strategy: Prioritize customers for retention campaigns

šŸ’¼ High Business Value: This project directly translates ML skills into business impact - perfect for interviews at SaaS companies, telecom, banking, and e-commerce!

šŸ“‹ Dataset & Setup

1 Install Dependencies

pip install pandas numpy scikit-learn xgboost imbalanced-learn matplotlib seaborn

2 Load Telco Customer Churn Data

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Download from Kaggle: https://www.kaggle.com/blastchar/telco-customer-churn
# Or create synthetic data:
from sklearn.datasets import make_classification

# Create synthetic churn dataset
X, y = make_classification(
    n_samples=7000,
    n_features=20,
    n_informative=15,
    n_redundant=5,
    n_classes=2,
    weights=[0.80, 0.20],  # 20% churn rate (imbalanced)
    random_state=42
)

# Create realistic feature names
feature_names = [
    'tenure_months', 'monthly_charges', 'total_charges', 
    'contract_type', 'payment_method', 'internet_service',
    'tech_support', 'online_security', 'online_backup',
    'device_protection', 'streaming_tv', 'streaming_movies',
    'paperless_billing', 'num_services', 'avg_call_duration',
    'customer_service_calls', 'late_payments', 'data_usage_gb',
    'age', 'dependents'
]

df = pd.DataFrame(X, columns=feature_names)
df['Churn'] = y

print(f"Dataset Shape: {df.shape}")
print(f"\nChurn Rate: {df['Churn'].mean():.1%}")
print(f"Churned Customers: {df['Churn'].sum()}")
print(f"Retained Customers: {(df['Churn']==0).sum()}")

šŸ’” Using Real Data: For production-ready project, download the Kaggle Telco Churn dataset which includes demographics, account info, and services. The code below works for both synthetic and real data!

šŸ“Š Part 1: Exploratory Data Analysis

Class Imbalance Analysis

# Visualize class distribution
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Count plot
churn_counts = df['Churn'].value_counts()
axes[0].bar(['Retained', 'Churned'], churn_counts.values, color=['#10b981', '#ef4444'])
axes[0].set_ylabel('Number of Customers')
axes[0].set_title('Customer Distribution')
for i, v in enumerate(churn_counts.values):
    axes[0].text(i, v + 100, str(v), ha='center', fontweight='bold')

# Percentage pie chart
axes[1].pie(churn_counts.values, labels=['Retained (0)', 'Churned (1)'], 
            autopct='%1.1f%%', colors=['#10b981', '#ef4444'], startangle=90)
axes[1].set_title('Churn Rate Distribution')

plt.tight_layout()
plt.show()

print(f"\nāš ļø IMBALANCED DATASET DETECTED!")
print(f"Churn Rate: {df['Churn'].mean():.1%}")
print("We'll need to handle this imbalance during training!")

Feature Analysis by Churn Status

# Compare churned vs retained customers
churned = df[df['Churn'] == 1]
retained = df[df['Churn'] == 0]

# Key metrics comparison
comparison = pd.DataFrame({
    'Metric': ['Avg Tenure (months)', 'Avg Monthly Charges', 'Avg Total Charges', 'Avg Service Calls'],
    'Churned': [
        churned['tenure_months'].mean(),
        churned['monthly_charges'].mean(),
        churned['total_charges'].mean(),
        churned['customer_service_calls'].mean()
    ],
    'Retained': [
        retained['tenure_months'].mean(),
        retained['monthly_charges'].mean(),
        retained['total_charges'].mean(),
        retained['customer_service_calls'].mean()
    ]
})
comparison['Difference %'] = ((comparison['Churned'] - comparison['Retained']) / comparison['Retained'] * 100).round(1)

print("\nšŸ“Š Churned vs Retained Comparison:")
print(comparison.to_string(index=False))

# Visualize key differences
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Tenure distribution
axes[0,0].hist([retained['tenure_months'], churned['tenure_months']], 
               bins=30, label=['Retained', 'Churned'], color=['#10b981', '#ef4444'], alpha=0.7)
axes[0,0].set_xlabel('Tenure (months)')
axes[0,0].set_ylabel('Frequency')
axes[0,0].set_title('Tenure Distribution')
axes[0,0].legend()

# Monthly charges
axes[0,1].hist([retained['monthly_charges'], churned['monthly_charges']], 
               bins=30, label=['Retained', 'Churned'], color=['#10b981', '#ef4444'], alpha=0.7)
axes[0,1].set_xlabel('Monthly Charges')
axes[0,1].set_title('Monthly Charges Distribution')
axes[0,1].legend()

# Customer service calls
axes[1,0].hist([retained['customer_service_calls'], churned['customer_service_calls']], 
               bins=20, label=['Retained', 'Churned'], color=['#10b981', '#ef4444'], alpha=0.7)
axes[1,0].set_xlabel('Customer Service Calls')
axes[1,0].set_title('Service Call Frequency')
axes[1,0].legend()

# Correlation heatmap (top 10 features)
correlation = df.corr()['Churn'].sort_values(ascending=False)[1:11]
axes[1,1].barh(range(len(correlation)), correlation.values)
axes[1,1].set_yticks(range(len(correlation)))
axes[1,1].set_yticklabels(correlation.index)
axes[1,1].set_xlabel('Correlation with Churn')
axes[1,1].set_title('Top 10 Churn Predictors')
axes[1,1].axvline(x=0, color='black', linestyle='--', alpha=0.3)

plt.tight_layout()
plt.show()

Statistical Significance Testing

from scipy.stats import ttest_ind

# T-test for key metrics
print("\nšŸ“Š Statistical Significance Tests:")
print("="*60)

for feature in ['tenure_months', 'monthly_charges', 'customer_service_calls']:
    churned_values = df[df['Churn']==1][feature]
    retained_values = df[df['Churn']==0][feature]
    
    t_stat, p_value = ttest_ind(churned_values, retained_values)
    
    significance = "āœ… SIGNIFICANT" if p_value < 0.05 else "āŒ NOT SIGNIFICANT"
    print(f"{feature}: p-value = {p_value:.6f} - {significance}")

āœ… Checkpoint 1: EDA Insights

Key findings you should have discovered:

  • 20% churn rate (imbalanced dataset)
  • Churned customers have shorter tenure
  • More customer service calls = higher churn risk
  • Monthly charges differ between groups

šŸ”§ Part 2: Data Preprocessing

Train/Test Split (Stratified)

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Separate features and target
X = df.drop('Churn', axis=1)
y = df['Churn']

# Stratified split (preserves class distribution)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

print(f"Training set: {len(X_train)} samples")
print(f"Test set: {len(X_test)} samples")
print(f"\nTrain churn rate: {y_train.mean():.1%}")
print(f"Test churn rate: {y_test.mean():.1%}")

# Feature scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Convert back to DataFrame
X_train_scaled = pd.DataFrame(X_train_scaled, columns=X_train.columns, index=X_train.index)
X_test_scaled = pd.DataFrame(X_test_scaled, columns=X_test.columns, index=X_test.index)

Handle Class Imbalance with SMOTE

from imblearn.over_sampling import SMOTE

# Apply SMOTE (Synthetic Minority Over-sampling Technique)
smote = SMOTE(random_state=42)
X_train_balanced, y_train_balanced = smote.fit_resample(X_train_scaled, y_train)

print(f"\nBefore SMOTE:")
print(f"  Class 0 (Retained): {(y_train==0).sum()}")
print(f"  Class 1 (Churned): {(y_train==1).sum()}")

print(f"\nAfter SMOTE:")
print(f"  Class 0 (Retained): {(y_train_balanced==0).sum()}")
print(f"  Class 1 (Churned): {(y_train_balanced==1).sum()}")
print(f"\nāœ… Classes are now balanced!")

āš ļø Important: Only apply SMOTE to training data, never to test data! We test on real-world distribution.

šŸ¤– Part 3: Model Training & Comparison

Model 1: Logistic Regression (Baseline)

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix, roc_auc_score, roc_curve

# Train Logistic Regression
lr_model = LogisticRegression(max_iter=1000, random_state=42)
lr_model.fit(X_train_balanced, y_train_balanced)

# Predictions
y_pred_lr = lr_model.predict(X_test_scaled)
y_proba_lr = lr_model.predict_proba(X_test_scaled)[:, 1]

# Evaluation
print("="*60)
print("LOGISTIC REGRESSION RESULTS")
print("="*60)
print(classification_report(y_test, y_pred_lr, target_names=['Retained', 'Churned']))
print(f"\nROC-AUC Score: {roc_auc_score(y_test, y_proba_lr):.4f}")

Model 2: Random Forest

from sklearn.ensemble import RandomForestClassifier

# Train Random Forest
rf_model = RandomForestClassifier(
    n_estimators=100,
    max_depth=10,
    min_samples_split=10,
    min_samples_leaf=5,
    class_weight='balanced',  # Handle imbalance
    random_state=42,
    n_jobs=-1
)
rf_model.fit(X_train_scaled, y_train)  # Use original training data with class_weight

# Predictions
y_pred_rf = rf_model.predict(X_test_scaled)
y_proba_rf = rf_model.predict_proba(X_test_scaled)[:, 1]

# Evaluation
print("\n" + "="*60)
print("RANDOM FOREST RESULTS")
print("="*60)
print(classification_report(y_test, y_pred_rf, target_names=['Retained', 'Churned']))
print(f"\nROC-AUC Score: {roc_auc_score(y_test, y_proba_rf):.4f}")

Model 3: XGBoost (Champion)

import xgboost as xgb

# Calculate scale_pos_weight for imbalance
scale_pos_weight = (y_train==0).sum() / (y_train==1).sum()

# Train XGBoost
xgb_model = xgb.XGBClassifier(
    n_estimators=200,
    learning_rate=0.1,
    max_depth=6,
    min_child_weight=3,
    gamma=0.1,
    subsample=0.8,
    colsample_bytree=0.8,
    scale_pos_weight=scale_pos_weight,  # Handle imbalance
    random_state=42,
    eval_metric='logloss'
)
xgb_model.fit(X_train_scaled, y_train)

# Predictions
y_pred_xgb = xgb_model.predict(X_test_scaled)
y_proba_xgb = xgb_model.predict_proba(X_test_scaled)[:, 1]

# Evaluation
print("\n" + "="*60)
print("XGBOOST RESULTS")
print("="*60)
print(classification_report(y_test, y_pred_xgb, target_names=['Retained', 'Churned']))
print(f"\nROC-AUC Score: {roc_auc_score(y_test, y_proba_xgb):.4f}")

Model Comparison Dashboard

# Compare all models
from sklearn.metrics import precision_score, recall_score, f1_score

models_comparison = pd.DataFrame({
    'Model': ['Logistic Regression', 'Random Forest', 'XGBoost'],
    'Precision': [
        precision_score(y_test, y_pred_lr, pos_label=1),
        precision_score(y_test, y_pred_rf, pos_label=1),
        precision_score(y_test, y_pred_xgb, pos_label=1)
    ],
    'Recall': [
        recall_score(y_test, y_pred_lr, pos_label=1),
        recall_score(y_test, y_pred_rf, pos_label=1),
        recall_score(y_test, y_pred_xgb, pos_label=1)
    ],
    'F1-Score': [
        f1_score(y_test, y_pred_lr, pos_label=1),
        f1_score(y_test, y_pred_rf, pos_label=1),
        f1_score(y_test, y_pred_xgb, pos_label=1)
    ],
    'ROC-AUC': [
        roc_auc_score(y_test, y_proba_lr),
        roc_auc_score(y_test, y_proba_rf),
        roc_auc_score(y_test, y_proba_xgb)
    ]
})

print("\n" + "="*70)
print("MODEL COMPARISON")
print("="*70)
print(models_comparison.to_string(index=False))

# Visualize comparison
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Metrics comparison
metrics = ['Precision', 'Recall', 'F1-Score', 'ROC-AUC']
x = np.arange(len(models_comparison))
width = 0.2

for i, metric in enumerate(metrics):
    axes[0].bar(x + i*width, models_comparison[metric], width, label=metric)

axes[0].set_xlabel('Models')
axes[0].set_ylabel('Score')
axes[0].set_title('Model Performance Comparison')
axes[0].set_xticks(x + width * 1.5)
axes[0].set_xticklabels(models_comparison['Model'], rotation=15)
axes[0].legend()
axes[0].set_ylim([0, 1])
axes[0].grid(axis='y', alpha=0.3)

# ROC Curves
fpr_lr, tpr_lr, _ = roc_curve(y_test, y_proba_lr)
fpr_rf, tpr_rf, _ = roc_curve(y_test, y_proba_rf)
fpr_xgb, tpr_xgb, _ = roc_curve(y_test, y_proba_xgb)

axes[1].plot(fpr_lr, tpr_lr, label=f'Logistic Reg (AUC={roc_auc_score(y_test, y_proba_lr):.3f})', linewidth=2)
axes[1].plot(fpr_rf, tpr_rf, label=f'Random Forest (AUC={roc_auc_score(y_test, y_proba_rf):.3f})', linewidth=2)
axes[1].plot(fpr_xgb, tpr_xgb, label=f'XGBoost (AUC={roc_auc_score(y_test, y_proba_xgb):.3f})', linewidth=2)
axes[1].plot([0, 1], [0, 1], 'k--', label='Random Classifier', alpha=0.3)
axes[1].set_xlabel('False Positive Rate')
axes[1].set_ylabel('True Positive Rate')
axes[1].set_title('ROC Curves Comparison')
axes[1].legend()
axes[1].grid(alpha=0.3)

plt.tight_layout()
plt.show()

# Best model
best_idx = models_comparison['ROC-AUC'].idxmax()
print(f"\nšŸ† Best Model: {models_comparison.loc[best_idx, 'Model']}")

āœ… Checkpoint 2: Models Trained

You've successfully:

  • Handled class imbalance with SMOTE and class_weight
  • Trained 3 different models
  • Compared precision, recall, F1, and ROC-AUC
  • Typically XGBoost achieves highest ROC-AUC (~0.85-0.90)

šŸ“Š Part 4: Business Impact Analysis

Confusion Matrix & Business Metrics

# Use best model (XGBoost)
cm = confusion_matrix(y_test, y_pred_xgb)

# Visualize confusion matrix
plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', 
            xticklabels=['Predicted Retained', 'Predicted Churned'],
            yticklabels=['Actually Retained', 'Actually Churned'])
plt.title('Confusion Matrix - XGBoost')
plt.ylabel('Actual')
plt.xlabel('Predicted')
plt.show()

# Calculate business metrics
TN, FP, FN, TP = cm.ravel()

print("\n" + "="*60)
print("CONFUSION MATRIX BREAKDOWN")
print("="*60)
print(f"True Negatives (Correctly predicted retained):  {TN}")
print(f"False Positives (Incorrectly predicted churn):  {FP}")
print(f"False Negatives (Missed churn):                  {FN}")
print(f"True Positives (Correctly predicted churn):     {TP}")

# Business impact calculations
avg_customer_value = 1000  # Average annual revenue per customer
retention_cost = 100       # Cost of retention campaign per customer
churn_cost = avg_customer_value  # Lost revenue from churned customer

# Scenario: Without model (no intervention)
baseline_churn_cost = (TP + FN) * churn_cost

# Scenario: With model (intervene on predicted churners)
# Assume 60% success rate for retention campaigns
retention_success_rate = 0.60
customers_saved = TP * retention_success_rate
campaign_cost = (TP + FP) * retention_cost
remaining_churn_cost = (FN + TP * (1 - retention_success_rate)) * churn_cost
total_cost_with_model = campaign_cost + remaining_churn_cost

# ROI calculation
cost_savings = baseline_churn_cost - total_cost_with_model
roi = (cost_savings / campaign_cost) * 100

print("\n" + "="*60)
print("BUSINESS IMPACT ANALYSIS")
print("="*60)
print(f"\nšŸ“Š Without Model (Baseline):")
print(f"  Churned Customers: {TP + FN}")
print(f"  Total Revenue Loss: ${baseline_churn_cost:,}")

print(f"\nšŸŽÆ With ML Model:")
print(f"  Predicted Churners: {TP + FP}")
print(f"  Retention Campaigns Sent: {TP + FP}")
print(f"  Campaign Cost: ${campaign_cost:,}")
print(f"  Customers Saved (~60% success): {int(customers_saved)}")
print(f"  Remaining Churn Cost: ${remaining_churn_cost:,}")
print(f"  Total Cost: ${total_cost_with_model:,}")

print(f"\nšŸ’° ROI:")
print(f"  Cost Savings: ${cost_savings:,}")
print(f"  ROI: {roi:.1f}%")
print(f"\nāœ… For every $1 spent on retention, save ${roi/100:.2f}!")

Feature Importance for Business Insights

# XGBoost feature importance
importance_df = pd.DataFrame({
    'Feature': X_train.columns,
    'Importance': xgb_model.feature_importances_
}).sort_values('Importance', ascending=False)

# Visualize
plt.figure(figsize=(10, 8))
plt.barh(importance_df['Feature'][:15], importance_df['Importance'][:15])
plt.xlabel('Importance Score')
plt.title('Top 15 Churn Drivers')
plt.gca().invert_yaxis()
plt.tight_layout()
plt.show()

print("\n" + "="*60)
print("TOP 10 CHURN DRIVERS (Actionable Insights)")
print("="*60)
for i, row in importance_df.head(10).iterrows():
    print(f"{row['Feature']:.<40} {row['Importance']:.4f}")

Customer Risk Segmentation

# Assign churn risk scores to all test customers
risk_scores = xgb_model.predict_proba(X_test_scaled)[:, 1]

# Create risk segments
risk_df = pd.DataFrame({
    'Customer_ID': X_test.index,
    'Churn_Probability': risk_scores,
    'Actual_Churn': y_test.values
})

# Define risk categories
def categorize_risk(prob):
    if prob < 0.3:
        return 'Low Risk'
    elif prob < 0.6:
        return 'Medium Risk'
    else:
        return 'High Risk'

risk_df['Risk_Category'] = risk_df['Churn_Probability'].apply(categorize_risk)

# Risk segment analysis
segment_analysis = risk_df.groupby('Risk_Category').agg({
    'Customer_ID': 'count',
    'Actual_Churn': 'sum',
    'Churn_Probability': 'mean'
}).rename(columns={'Customer_ID': 'Customer_Count', 'Actual_Churn': 'Actual_Churns'})
segment_analysis['Churn_Rate'] = (segment_analysis['Actual_Churns'] / segment_analysis['Customer_Count'] * 100).round(1)

print("\n" + "="*60)
print("CUSTOMER RISK SEGMENTATION")
print("="*60)
print(segment_analysis)

# Visualize risk distribution
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Risk category distribution
segment_counts = risk_df['Risk_Category'].value_counts()
axes[0].bar(segment_counts.index, segment_counts.values, color=['#10b981', '#f59e0b', '#ef4444'])
axes[0].set_ylabel('Number of Customers')
axes[0].set_title('Customer Risk Distribution')

# Churn probability histogram
axes[1].hist(risk_scores, bins=50, edgecolor='black', alpha=0.7)
axes[1].axvline(x=0.3, color='orange', linestyle='--', label='Low/Medium threshold')
axes[1].axvline(x=0.6, color='red', linestyle='--', label='Medium/High threshold')
axes[1].set_xlabel('Churn Probability')
axes[1].set_ylabel('Number of Customers')
axes[1].set_title('Churn Probability Distribution')
axes[1].legend()

plt.tight_layout()
plt.show()

# High-risk customers for immediate action
high_risk = risk_df[risk_df['Risk_Category'] == 'High Risk'].sort_values('Churn_Probability', ascending=False)
print(f"\n🚨 {len(high_risk)} High-Risk Customers Identified")
print("\nTop 10 Customers for Immediate Intervention:")
print(high_risk.head(10)[['Customer_ID', 'Churn_Probability', 'Actual_Churn']])

āœ… Checkpoint 3: Business Analysis Complete

You've created:

  • ROI calculation showing value of ML model
  • Feature importance identifying top churn drivers
  • Risk segmentation (Low/Medium/High)
  • Prioritized list of customers for intervention

šŸ’¾ Part 5: Model Deployment

import pickle
from datetime import datetime

# Save model and preprocessing objects
model_package = {
    'model': xgb_model,
    'scaler': scaler,
    'feature_names': X_train.columns.tolist(),
    'training_date': datetime.now(),
    'performance_metrics': {
        'roc_auc': roc_auc_score(y_test, y_proba_xgb),
        'precision': precision_score(y_test, y_pred_xgb, pos_label=1),
        'recall': recall_score(y_test, y_pred_xgb, pos_label=1),
        'f1_score': f1_score(y_test, y_pred_xgb, pos_label=1)
    },
    'business_metrics': {
        'roi_percentage': roi,
        'cost_savings': float(cost_savings)
    }
}

# Save
model_filename = f'churn_model_{datetime.now().strftime("%Y%m%d")}.pkl'
with open(model_filename, 'wb') as f:
    pickle.dump(model_package, f)

print(f"āœ… Model saved as: {model_filename}")

# Prediction API function
def predict_churn_risk(customer_features):
    """
    Predict churn risk for a single customer
    
    Parameters:
    -----------
    customer_features : dict
        Dictionary with feature names and values
    
    Returns:
    --------
    dict with risk probability, category, and recommendation
    """
    # Load model
    with open(model_filename, 'rb') as f:
        package = pickle.load(f)
    
    model = package['model']
    scaler = package['scaler']
    
    # Create DataFrame
    input_df = pd.DataFrame([customer_features])
    
    # Scale
    input_scaled = scaler.transform(input_df)
    
    # Predict
    churn_prob = model.predict_proba(input_scaled)[0, 1]
    
    # Risk category
    if churn_prob < 0.3:
        risk_category = 'Low Risk'
        recommendation = 'Standard engagement'
    elif churn_prob < 0.6:
        risk_category = 'Medium Risk'
        recommendation = 'Monitor closely, send satisfaction survey'
    else:
        risk_category = 'High Risk'
        recommendation = 'IMMEDIATE ACTION: Retention campaign, personal outreach'
    
    return {
        'churn_probability': round(churn_prob, 3),
        'risk_category': risk_category,
        'recommendation': recommendation
    }

# Example usage
sample_customer = {feature: X_test.iloc[0][feature] for feature in X_train.columns}
result = predict_churn_risk(sample_customer)

print("\n" + "="*60)
print("EXAMPLE PREDICTION")
print("="*60)
print(f"Churn Probability: {result['churn_probability']:.1%}")
print(f"Risk Category: {result['risk_category']}")
print(f"Recommendation: {result['recommendation']}")

šŸŽÆ Project Summary

šŸŽ‰ Outstanding Work!

You've built a production-ready churn prediction system with real business impact!

šŸ“Š Key Achievements

  • āœ… Handled imbalanced data using SMOTE and class weights
  • āœ… Trained 3 models achieving ~85-90% ROC-AUC
  • āœ… Calculated business ROI showing 3-5x return on investment
  • āœ… Identified churn drivers for actionable insights
  • āœ… Created risk segments for targeted interventions
  • āœ… Deployed model with prediction API

šŸš€ Next Steps

  • Deploy as Flask API: Create REST endpoint for real-time predictions
  • Build Dashboard: Use Streamlit or Dash for interactive visualization
  • A/B Testing: Test intervention strategies on different risk segments
  • Feature Engineering: Add customer behavior sequences (RFM analysis)
  • Time-Series: Predict when customer will churn (not just if)

šŸ’¼ Interview Talking Points:

  • "Reduced customer churn by identifying 80% of at-risk customers with 85% precision"
  • "Calculated 400% ROI on retention campaigns using ML-driven targeting"
  • "Handled class imbalance using SMOTE and achieved 0.88 ROC-AUC with XGBoost"
  • "Created risk segmentation enabling prioritized customer interventions"