Naive Bayes Classification

Probabilistic Machine Learning Algorithm

What is Naive Bayes Classification?

Naive Bayes is a probabilistic classification algorithm based on Bayes' theorem with a "naive" assumption of conditional independence between features.

Key Characteristics:

  • Probabilistic: Calculates probability of each class
  • Fast: Simple calculations, quick training and prediction
  • Effective: Works well with small datasets
  • "Naive": Assumes features are independent (often not true in reality)

Real-World Applications:

  • Email spam detection
  • Text classification
  • Sentiment analysis
  • Medical diagnosis
  • Weather prediction

Bayes' Theorem Foundation

Naive Bayes is built on Bayes' theorem, which calculates conditional probability:

P(A|B) = P(B|A) × P(A) / P(B)

In Classification Context:

P(Class|Features) = P(Features|Class) × P(Class) / P(Features)
  • P(Class|Features): Probability of class given features (what we want)
  • P(Features|Class): Likelihood of features given class
  • P(Class): Prior probability of class
  • P(Features): Evidence (normalizing constant)

The "Naive" Assumption

The algorithm assumes that all features are conditionally independent given the class label.

What This Means:

For features X₁, X₂, X₃... given class C:

P(X₁, X₂, X₃|C) = P(X₁|C) × P(X₂|C) × P(X₃|C)

Why "Naive"? In reality, features often depend on each other, but this assumption simplifies calculations significantly.

Example of Independence Assumption:

In email spam detection:

  • Presence of word "FREE" is independent of word "URGENT"
  • Email length is independent of sender domain
  • Time sent is independent of number of exclamation marks

Note: These may actually be related, but Naive Bayes treats them as independent.

How Naive Bayes Works

Step-by-Step Process:

  1. Calculate Prior Probabilities: P(Class) for each class
  2. Calculate Likelihoods: P(Feature|Class) for each feature
  3. Apply Bayes' Theorem: Multiply prior × likelihoods
  4. Predict: Choose class with highest probability
P(Class|X₁,X₂,...,Xₙ) ∝ P(Class) × ∏P(Xᵢ|Class)

Tennis Playing Example - Step by Step

Let's predict whether to play tennis based on weather conditions:

Training Data:

Outlook Temperature Humidity Wind Play Tennis?
SunnyHotHighWeakNo
SunnyHotHighStrongNo
OvercastHotHighWeakYes
RainMildHighWeakYes
RainCoolNormalWeakYes
RainCoolNormalStrongNo
OvercastCoolNormalStrongYes
SunnyMildHighWeakNo

🎯 Interactive Problem Solving

Question: Play tennis when Outlook=Sunny, Temperature=Cool, Humidity=High, Wind=Strong?

🧮 Try Your Own Example

Select conditions and see the prediction:

Gaussian Naive Bayes

When dealing with continuous numerical features, we use Gaussian (Normal) distribution:

Gaussian Distribution Formula:

P(x|class) = (1/√(2πσ²)) × e^(-(x-μ)²/(2σ²))
  • μ (mu): Mean of feature values for the class
  • σ (sigma): Standard deviation for the class
  • x: Feature value we want to evaluate

Example: Height and Weight Classification

Classifying gender based on height and weight:

  • Male: Height μ=175cm, σ=10cm; Weight μ=70kg, σ=15kg
  • Female: Height μ=162cm, σ=8cm; Weight μ=55kg, σ=12kg

For a person with height=170cm, weight=65kg, calculate P(height|Male), P(weight|Male), etc.

Advantages & Disadvantages

✅ Advantages:

  • Fast: Quick training and prediction
  • Simple: Easy to understand and implement
  • Small Data: Works well with limited training data
  • Multi-class: Naturally handles multiple classes
  • Probabilistic: Provides probability estimates
  • No Overfitting: Less prone to overfitting

❌ Disadvantages:

  • Independence Assumption: Features are rarely truly independent
  • Categorical Inputs: Requires discretization for continuous features
  • Zero Probability: If a feature value never appears with a class
  • Limited Expressiveness: Cannot capture complex relationships

When to Use Naive Bayes:

  • Text classification (spam detection, sentiment analysis)
  • Small to medium datasets
  • When you need fast, simple baseline model
  • When features are relatively independent
  • Multi-class classification problems