HomeMachine LearningLogistic Regression

Logistic Regression

Learn the essential classification algorithm that predicts probabilities and categories

📅 Module 3 📊 Beginner

🎓 Complete all modules to earn your Free Machine Learning Certificate
Shareable on LinkedIn • Verified by AITutorials.site • No signup fee

🎯 Welcome to Logistic Regression

Despite its name, Logistic Regression is actually a classification algorithm, not regression! It's one of the most popular algorithms for binary classification (yes/no, spam/not spam, disease/healthy).

In this tutorial, you'll learn how machines predict probabilities and make binary decisions using a brilliant mathematical function called the sigmoid.

📊 Classification vs Regression

Aspect Regression Classification
Output Continuous values (any number) Discrete categories (Class A, B, etc.)
Examples Predicting prices, temperature, distance Email (spam/not spam), disease (yes/no)
Output Range -∞ to +∞ Class labels (0 or 1 for binary)
Best Metric MSE, R² Score Accuracy, Precision, Recall

💡 Why "Regression" in the name? Because it builds on linear regression by adding a special mathematical function (sigmoid) to convert continuous outputs to probabilities. Historical naming quirk!

⚙️ How Logistic Regression Works

The Sigmoid Function

The magic of Logistic Regression is the sigmoid function. It takes any number and squashes it into a probability (0 to 1):

σ(z) = 1 / (1 + e^(-z))

Where z = w₁x₁ + w₂x₂ + ... + b (same as linear regression!)

Key properties of sigmoid:

  • Input can be any number (negative or positive)
  • Output is always between 0 and 1
  • Output = probability of class 1 (positive class)
  • S-shaped curve makes smooth transitions

🏥 Example: Predicting if a patient has a disease:

  • Sigmoid output = 0.2 → 20% probability of disease
  • Sigmoid output = 0.9 → 90% probability of disease
  • Decision threshold = 0.5 (if output ≥ 0.5, predict "disease")

Decision Boundary

Logistic Regression creates a decision boundary that separates classes. Points on one side are classified as Class 0, the other as Class 1.

💻 Logistic Regression in Python

# Spam Email Detection using Logistic Regression
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score
import numpy as np

# Sample data: Features extracted from emails
# Features: [word_count, link_count, caps_ratio, exclamation_marks]
X_train = np.array([
    [100, 2, 0.05, 0],      # Legitimate
    [500, 15, 0.3, 5],      # Spam
    [150, 1, 0.02, 0],      # Legitimate
    [600, 20, 0.35, 8],     # Spam
    [200, 3, 0.08, 1],      # Legitimate
])

# Labels: 0 = Not spam, 1 = Spam
y_train = np.array([0, 1, 0, 1, 0])

# Create and train the model
model = LogisticRegression()
model.fit(X_train, y_train)

# Make predictions
new_email = np.array([[120, 1, 0.03, 0]])
prediction = model.predict(new_email)
probability = model.predict_proba(new_email)

print(f"Prediction: {'Spam' if prediction[0] == 1 else 'Not Spam'}")
print(f"Probability: {probability[0][1]:.2%} chance of being spam")

# Evaluate model
y_pred = model.predict(X_train)
accuracy = accuracy_score(y_train, y_pred)
print(f"\nAccuracy: {accuracy:.2%}")

💡 predict_proba(): Returns probabilities for both classes. Use predict() for hard predictions (0 or 1).

🔀 Beyond Binary: Multiclass Classification

Logistic Regression naturally handles binary (2-class) problems. For problems with 3+ classes, there are two approaches:

1️⃣

One-vs-Rest (OvR)

Train multiple binary classifiers (Class A vs All, Class B vs All, etc.) and take the highest probability

2️⃣

Multinomial

Train a single classifier that directly learns all classes using softmax (extension of sigmoid)

# Multiclass example: Iris flower classification
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression

iris = load_iris()
X, y = iris.data, iris.target

# Logistic Regression automatically handles multiclass
model = LogisticRegression(max_iter=200)
model.fit(X, y)

# Predict class and probabilities
prediction = model.predict([[5.1, 3.5, 1.4, 0.2]])
probabilities = model.predict_proba([[5.1, 3.5, 1.4, 0.2]])

print(f"Predicted class: {iris.target_names[prediction[0]]}")
for i, prob in enumerate(probabilities[0]):
    print(f"{iris.target_names[i]}: {prob:.2%}")

📈 Key Evaluation Metrics

For classification, we use different metrics than regression:

Metric Formula/Description When to Use
Accuracy % of correct predictions Balanced datasets
Precision Of predicted positives, how many are correct? When false positives are costly
Recall Of actual positives, how many did we find? When false negatives are costly
F1-Score Harmonic mean of precision and recall Imbalanced datasets
AUC-ROC Area under the ROC curve Comprehensive performance metric

🏥 Disease Detection Example:

  • Precision matters: If test says you have disease, how sure are we? (avoid false alarms)
  • Recall matters: Did we catch all actual disease cases? (avoid missing real cases)

✅ Strengths & ❌ Limitations

Probabilistic Output

Returns probabilities, not just class labels

Fast & Efficient

Trains quickly even on large datasets

Interpretable

Feature weights show importance in predictions

Linear Boundary Only

Can't handle complex non-linear decision boundaries

Requires Feature Scaling

Performance improves when features are normalized

Not for Complex Data

Tree-based methods often outperform on high-dimensional data

🎯 When to Use Logistic Regression

  • Binary or multiclass classification - Predicting categories, not numbers
  • You need probabilities - Not just class predictions
  • Interpretability matters - You need to explain decisions to others
  • Fast inference required - Real-time predictions needed
  • Classes are linearly separable - Decision boundary is roughly a line
  • Baseline classifier - Before trying complex algorithms

⚠️ Common Misconception: "Logistic Regression doesn't work with non-linear data." You can add polynomial features or interactions to handle non-linearity!

📚 Key Concepts Summary

  • Classification: Predicting discrete categories, not continuous values
  • Binary Classification: Two-class problem (yes/no, spam/not spam)
  • Sigmoid Function: Converts linear output to probability (0-1)
  • Decision Boundary: The line/plane that separates classes
  • Threshold: Usually 0.5 for probability → class conversion
  • Accuracy/Precision/Recall: Different metrics for different needs

📋 Summary

What You've Learned:

  • Logistic Regression is for classification (despite the name!)
  • Uses sigmoid function to convert outputs to probabilities
  • Outputs probabilities and class predictions
  • Works for both binary and multiclass problems
  • Fast, interpretable, but assumes linear separability

What's Next?

In the next tutorial, Decision Trees, we'll learn how to build tree-based models that can handle non-linear relationships and are much more flexible than Logistic Regression. Get ready for a different approach to classification!

🎉 Great Progress! You now know two fundamental algorithms: Linear Regression for continuous predictions and Logistic Regression for categories. These form the foundation of ML mastery!

📝 Knowledge Check

Test your understanding of Logistic Regression!

1. What type of problems is logistic regression used for?

A) Predicting continuous values
B) Binary and multi-class classification
C) Clustering data points
D) Time series forecasting

2. What function does logistic regression use to convert predictions to probabilities?

A) ReLU function
B) Linear function
C) Sigmoid (logistic) function
D) Softmax function

3. What range do probabilities output by logistic regression fall into?

A) 0 to 1
B) -1 to 1
C) -∞ to +∞
D) 0 to 100

4. What is the decision boundary in logistic regression typically set at?

A) 0.0
B) 0.25
C) 0.75
D) 0.5 (can be adjusted)

5. What does regularization in logistic regression help prevent?

A) Underfitting
B) Overfitting by penalizing large coefficients
C) Data imbalance
D) Missing values