🎯 What is Linear Regression?
Linear regression finds the best straight line through data points to predict continuous values.
Formula: y = mx + b
- y = predicted value
- m = slope
- x = input feature
- b = y-intercept
💻 Quick Implementation
import numpy as np
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
# Sample data: Hours studied vs Exam score
X = np.array([[1], [2], [3], [4], [5]]) # Hours
y = np.array([50, 60, 70, 80, 90]) # Scores
# Create and train model
model = LinearRegression()
model.fit(X, y)
# Make prediction
prediction = model.predict([[6]]) # 6 hours
print(f"Predicted score: {prediction[0]:.1f}")
📊 Real Example
Problem: Predict house prices based on size
# House data
sizes = np.array([[1000], [1500], [2000], [2500]]) # sq ft
prices = np.array([200000, 300000, 400000, 500000]) # dollars
# Train model
house_model = LinearRegression()
house_model.fit(sizes, prices)
# Predict price for 1800 sq ft house
new_price = house_model.predict([[1800]])
print(f"1800 sq ft house price: ${new_price[0]:,.0f}")
# Output: 1800 sq ft house price: $360,000
⚡ Key Points
✅ When to Use:
- Predicting continuous values
- Linear relationships
- Simple baseline model
❌ Limitations:
- Only linear relationships
- Sensitive to outliers
- Assumes normal distribution
🚀 Complete Working Example
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
# Create sample dataset
data = {
'experience': [1, 2, 3, 4, 5, 6, 7, 8],
'salary': [30000, 35000, 40000, 45000, 50000, 55000, 60000, 65000]
}
df = pd.DataFrame(data)
# Split data
X = df[['experience']]
y = df['salary']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
# Train and evaluate
model = LinearRegression()
model.fit(X_train, y_train)
predictions = model.predict(X_test)
print(f"R² Score: {r2_score(y_test, predictions):.2f}")
print(f"Equation: Salary = {model.coef_[0]:.0f} * Experience + {model.intercept_:.0f}")
🎓 Summary
Linear Regression is the simplest ML algorithm that draws a line through data to make predictions. Perfect for beginners and often surprisingly effective!
Next Steps: Try polynomial regression, multiple features, or regularization techniques.