Naive Bayes Classification
Probabilistic Machine Learning Algorithm
What is Naive Bayes Classification?
Naive Bayes is a probabilistic classification algorithm based on Bayes' theorem with a "naive" assumption of conditional independence between features.
Key Characteristics:
- Probabilistic: Calculates probability of each class
- Fast: Simple calculations, quick training and prediction
- Effective: Works well with small datasets
- "Naive": Assumes features are independent (often not true in reality)
Real-World Applications:
- Email spam detection
- Text classification
- Sentiment analysis
- Medical diagnosis
- Weather prediction
Bayes' Theorem Foundation
Naive Bayes is built on Bayes' theorem, which calculates conditional probability:
In Classification Context:
- P(Class|Features): Probability of class given features (what we want)
- P(Features|Class): Likelihood of features given class
- P(Class): Prior probability of class
- P(Features): Evidence (normalizing constant)
The "Naive" Assumption
The algorithm assumes that all features are conditionally independent given the class label.
What This Means:
For features X₁, X₂, X₃... given class C:
Why "Naive"? In reality, features often depend on each other, but this assumption simplifies calculations significantly.
Example of Independence Assumption:
In email spam detection:
- Presence of word "FREE" is independent of word "URGENT"
- Email length is independent of sender domain
- Time sent is independent of number of exclamation marks
Note: These may actually be related, but Naive Bayes treats them as independent.
How Naive Bayes Works
Step-by-Step Process:
- Calculate Prior Probabilities: P(Class) for each class
- Calculate Likelihoods: P(Feature|Class) for each feature
- Apply Bayes' Theorem: Multiply prior × likelihoods
- Predict: Choose class with highest probability
Tennis Playing Example - Step by Step
Let's predict whether to play tennis based on weather conditions:
Training Data:
| Outlook | Temperature | Humidity | Wind | Play Tennis? |
|---|---|---|---|---|
| Sunny | Hot | High | Weak | No |
| Sunny | Hot | High | Strong | No |
| Overcast | Hot | High | Weak | Yes |
| Rain | Mild | High | Weak | Yes |
| Rain | Cool | Normal | Weak | Yes |
| Rain | Cool | Normal | Strong | No |
| Overcast | Cool | Normal | Strong | Yes |
| Sunny | Mild | High | Weak | No |
🎯 Interactive Problem Solving
Question: Play tennis when Outlook=Sunny, Temperature=Cool, Humidity=High, Wind=Strong?
🧮 Try Your Own Example
Select conditions and see the prediction:
Gaussian Naive Bayes
When dealing with continuous numerical features, we use Gaussian (Normal) distribution:
Gaussian Distribution Formula:
- μ (mu): Mean of feature values for the class
- σ (sigma): Standard deviation for the class
- x: Feature value we want to evaluate
Example: Height and Weight Classification
Classifying gender based on height and weight:
- Male: Height μ=175cm, σ=10cm; Weight μ=70kg, σ=15kg
- Female: Height μ=162cm, σ=8cm; Weight μ=55kg, σ=12kg
For a person with height=170cm, weight=65kg, calculate P(height|Male), P(weight|Male), etc.
Advantages & Disadvantages
✅ Advantages:
- Fast: Quick training and prediction
- Simple: Easy to understand and implement
- Small Data: Works well with limited training data
- Multi-class: Naturally handles multiple classes
- Probabilistic: Provides probability estimates
- No Overfitting: Less prone to overfitting
❌ Disadvantages:
- Independence Assumption: Features are rarely truly independent
- Categorical Inputs: Requires discretization for continuous features
- Zero Probability: If a feature value never appears with a class
- Limited Expressiveness: Cannot capture complex relationships
When to Use Naive Bayes:
- Text classification (spam detection, sentiment analysis)
- Small to medium datasets
- When you need fast, simple baseline model
- When features are relatively independent
- Multi-class classification problems