Clients Churn Prediction

Can you predict when a customer is about to churn?

Acquiring a new customer is far more expensive than retaining one. This system helps you do exactly that by ranking the clients most at risk of churning.

GitHub Repo

Solution

I developed a complete supervised machine learning pipeline — from data preprocessing to model evaluation and interpretability — to predict churn with high precision and recall. The solution integrates classification modeling, business impact simulation, and explainability tools.

Precision‑first churn modeling with real‑world impact: up to 5.8× uplift in retention vs. random campaigns.

How It Works

  1. Data Preprocessing
    • Cleaned & transformed 5K customer records (null resolution, type standardisation, leakage removal).
    • Scaled numeric features; encoded categoricals (one‑hot / binary) → compact, sparse‑aware matrix.
    • Class imbalance: churn only 14%. Used stratified splits & kept natural ratios (no naive oversampling) to preserve probability calibration.
    Class imbalance: 14% churn vs 86% non-churn
    Minority churn class at 14% handled through stratified splitting + calibrated thresholds.
  2. Model Training & Evaluation
    • Benchmarked Logistic Regression, Random Forest & XGBoost on identical stratified folds (precision, recall, ROC‑AUC, F1).
    • XGBoost led consistently (ROC‑AUC 0.928, Precision 0.94, F1 0.848) while preserving recall.
    Model comparison: Logistic Regression vs Random Forest vs XGBoost
    XGBoost outperforms on ROC‑AUC & F1 with strong precision.
    Validation metrics (identical stratified folds)
    Metric LogReg RandomForest XGBoost
    Accuracy0.87100.95600.9590
    Precision0.63040.92170.9167
    Recall0.20570.75180.7801
    F1-score0.31020.82810.8429
    ROC-AUC0.82570.90490.9215
  3. Hyperparameter Tuning
    • Iterative tuning of tree depth, learning rate & regularisation (target: lift precision & AUC without harming recall).
    • Result: precision +0.023 with stable recall (‑0.007) and modest gains in AUC & F1.
    Untuned vs Tuned XGBoost
    Metric Untuned Tuned Δ
    Accuracy0.95900.9610+0.0020
    Precision0.91670.9397+0.0230
    Recall0.78010.7730-0.0071
    F1-score0.84290.8482+0.0053
    ROC-AUC0.92150.9281+0.0066
  4. Overfitting

    Three diagnostics run before locking the regularised production configuration.

    1. Train vs Test Metrics – large drops on test indicate overfitting.
      Pre‑regularisation: train vs test performance (high‑capacity tuned)
      MetricTrainTestGap
      Accuracy0.98020.96100.0192
      Precision0.99590.93970.0563
      Recall0.86400.77300.0909
      F10.92530.84820.0770
      ROC‑AUC0.99560.92810.0675

      Max absolute gap 0.0909 (Recall) → notable overfitting risk.

    2. K‑Fold Cross Validation (5‑fold stratified) – instability surfaced via mean gaps.
      Pre‑regularisation: train vs CV means
      MetricTrain MeanCV MeanGap Mean
      ROC‑AUC0.99740.92500.0724
      Precision0.99800.93080.0671
      Recall0.87100.75450.1165

      Largest mean gap 0.1165 (Recall) → capacity reduction & regularisation required.

    3. Learning Curve (ROC‑AUC) – widening stable gap (train near 1.0, validation lower & flat) confirms model capacity > signal.
      Learning curve ROC-AUC: train saturates near 1.0, validation plateaus lower indicating overfitting
      Train 0.9974 vs Validation 0.9250 (gap 0.0724) prior to regularisation.

    Consistent multi‑metric gaps (>0.05) – especially recall & ROC‑AUC – showed the model was partially memorising minority churn patterns; this motivated the regularisation applied in the next step.

  5. Regularization & Final Model
    • Applied depth/weight limits + subsampling + L1/L2 (max_depth=3, min_child_weight=3, subsample=0.7, colsample_bytree=0.7, reg_alpha=1.0, reg_lambda=5.0, learning_rate=0.07, early stopping ~200 trees).
    Final regularised model (train vs test)
    MetricTrainTestGap
    Accuracy0.97100.95500.0160
    Precision0.98280.91380.0690
    Recall0.80920.75180.0574
    F10.88760.82490.0627
    ROC‑AUC0.95600.92050.0355

    Max gap 0.0690 (was 0.0909). Recall 0.0909 → 0.0574; ROC‑AUC 0.0675 → 0.0355.

    • Reduced gaps (max 0.091 → 0.069) lowers overfitting risk; model ready for production scoring & churn‑risk ranking.
  6. Key Metrics
    Key Metrics (Final Regularised Model)
    MetricValueDescription
    Accuracy0.955Overall correct predictions
    Precision0.914% of predicted churners who actually churned
    Recall0.752% of actual churners correctly identified
    F1-score0.825Balance between precision and recall
    ROC-AUC0.921Discrimination ability across thresholds
    • Balanced precision & recall with strong AUC enables confident high‑risk ranking without overwhelming retention capacity.

Key Insights

Top Features Driving Churn

Feature importance bar chart highlighting service calls, international plan, daytime usage, international usage, voicemail plan
Feature importance (placeholder) – ranked drivers of predicted churn probability.

Strategic Recommendations

Business Impact (Targeted Outreach Simulation)

A simulation was conducted to compare the impact of targeting the top 500 high-risk customers ranked by the model, versus contacting 500 customers at random:

ScenarioContactsActual Churners ReachedEstimated Saves (30%)Retention Uplift
Random Targeting500~71~211.0×
Model-Based Targeting500485~1455.8×

Churn reduction: ~20% projected reduction when targeting the top 500 customers with retention actions, compared to ~3% from random outreach.

Financial angle: Assuming a $240 annual margin per retained customer, this represents ~$10K protected ARR per 500-contact cycle.

Tech Stack

Python Pandas NumPy Scikit-Learn XGBoost Matplotlib Seaborn Imbalanced Data Model Explainability Feature Engineering Cross-Validation Hyperparameter Tuning Classification Metrics

Code