Multiple Linear Regression
Multiple Linear Regression predicts a continuous target using two or more input features. It assumes a linear relationship between inputs and output: where:
- = intercept
- = coefficients
- = predicted value
For this example with two features:
The model learns the best coefficients by minimizing the sum of squared errors:
Intuition
Simple Linear Regression fits a line. Multiple Linear Regression with two features fits a plane. With more than two features, it fits a hyperplane.
So here:
- affects
- affects
- the model combines both effects into one equation
Main Equation
If the dataset has two input features: then after adding the bias column:
The Normal Equation is: where:
Short and Clean Code
import numpy as np
import matplotlib.pyplot as plt
class MultipleLinearRegression:
def __init__(self):
self.intercept_ = 0.0
self.coef_ = None
self.r2_ = 0.0
def fit(self, X, y):
X = np.asarray(X, dtype=float)
y = np.asarray(y, dtype=float).reshape(-1, 1)
Xb = np.c_[np.ones((len(X), 1)), X]
theta = np.linalg.inv(Xb.T @ Xb) @ Xb.T @ y
self.intercept_ = theta[0, 0]
self.coef_ = theta[1:, 0]
y_pred = Xb @ theta
ss_res = np.sum((y - y_pred) ** 2)
ss_tot = np.sum((y - y.mean()) ** 2)
self.r2_ = 1 - ss_res / ss_tot
return self
def predict(self, X):
X = np.asarray(X, dtype=float)
return self.intercept_ + X @ self.coef_
np.random.seed(0)
X1 = np.random.randint(1, 11, 15)
X2 = np.random.randint(1, 11, 15)
X = np.column_stack((X1, X2))
y = 1 + 2 * X1 + 3 * X2 + np.random.randn(15) * 2
model = MultipleLinearRegression().fit(X, y)
y_pred = model.predict(X)
print("Intercept:", round(model.intercept_, 4))
print("Coefficients:", np.round(model.coef_, 4))
print("R^2:", round(model.r2_, 4))
fig = plt.figure(figsize=(9, 6))
ax = fig.add_subplot(111, projection="3d")
ax.scatter(X[:, 0], X[:, 1], y, label="Data")
x1_grid, x2_grid = np.meshgrid(
np.linspace(X[:, 0].min(), X[:, 0].max(), 20),
np.linspace(X[:, 1].min(), X[:, 1].max(), 20)
)
grid_points = np.c_[x1_grid.ravel(), x2_grid.ravel()]
z_grid = model.predict(grid_points).reshape(x1_grid.shape)
ax.plot_surface(x1_grid, x2_grid, z_grid, alpha=0.5)
ax.set_xlabel("X1")
ax.set_ylabel("X2")
ax.set_zlabel("y")
ax.set_title("Multiple Linear Regression Plane")
plt.show()
Dataset Used
The target was generated using:
Code:
y = 1 + 2 * X1 + 3 * X2 + np.random.randn(15) * 2
Concept:
- true intercept is near 1
- true coefficient of is near 2
- true coefficient of is near 3
- random noise is added to make it realistic
So the model should learn values close to:
Step-by-Step Algorithm
Step 1: Prepare the data
We create two input features:
X1 = np.random.randint(1, 11, 15)
X2 = np.random.randint(1, 11, 15)
X = np.column_stack((X1, X2))
Concept: Each sample now has two features:
So a row may look like: meaning:
- first feature = 6
- second feature = 8
Step 2: Add the bias column
To learn the intercept together with slopes, add a column of ones:
Code:
Xb = np.c_[np.ones((len(X), 1)), X]
Concept:
- first column handles the intercept
- remaining columns are the actual features
Step 3: Compute coefficients using the Normal Equation
Equation:
Code:
theta = np.linalg.inv(Xb.T @ Xb) @ Xb.T @ y
Concept: This directly computes the best-fit coefficients without iterative optimization.
The parameter vector is:
Code mapping:
self.intercept_ = theta[0, 0]
self.coef_ = theta[1:, 0]
So:
intercept_storescoef_[0]storescoef_[1]stores
Step 4: Form the regression equation
Once the parameters are learned, the model becomes:
Code:
return self.intercept_ + X @ self.coef_
Concept: For each new row:
- multiply each feature by its coefficient
- add the intercept
- get the predicted output
Step 5: Compute predictions
Predictions for training data:
Code:
y_pred = Xb @ theta
Concept: This gives the fitted values of the regression plane on the training samples.
Step 6: Evaluate with
The coefficient of determination is:
Code:
ss_res = np.sum((y - y_pred) ** 2)
ss_tot = np.sum((y - y.mean()) ** 2)
self.r2_ = 1 - ss_res / ss_tot
Meaning:
- = residual sum of squares
- = total sum of squares
Interpretation:
- means perfect fit
- larger means better fit
- close to 0 means poor explanatory power
Code Explanation: Concept -> Equation -> Code
1. Store model parameters
Concept: The model needs to store intercept, coefficients, and score.
Code:
def __init__(self):
self.intercept_ = 0.0
self.coef_ = None
self.r2_ = 0.0
Meaning:
intercept_=coef_= slopesr2_= model accuracy measure
2. Convert input shapes properly
Concept: Matrix operations need correctly shaped arrays.
Code:
X = np.asarray(X, dtype=float)
y = np.asarray(y, dtype=float).reshape(-1, 1)
Meaning:
Xbecomes a 2D matrixybecomes a column vector
3. Add intercept term
Concept: We include the intercept in matrix multiplication by adding a bias column.
Equation:
Code:
Xb = np.c_[np.ones((len(X), 1)), X]
4. Learn best coefficients
Concept: Find coefficients that minimize squared error.
Equation:
Code:
theta = np.linalg.inv(Xb.T @ Xb) @ Xb.T @ y
5. Separate intercept and slopes
Concept: The first value of is intercept, the rest are feature coefficients.
Code:
self.intercept_ = theta[0, 0]
self.coef_ = theta[1:, 0]
If: then the model is:
6. Predict outputs
Concept: Use the learned plane to estimate new values.
Equation:
Code:
return self.intercept_ + X @ self.coef_
7. Evaluate goodness of fit
Concept: Check how much variance in is explained by the model.
Equation:
Code:
ss_res = np.sum((y - y_pred) ** 2)
ss_tot = np.sum((y - y.mean()) ** 2)
self.r2_ = 1 - ss_res / ss_tot
Worked Example
Suppose the learned model is approximately:
Take one sample:
Prediction:
So for that point, the model predicts:
This is exactly what the predict() method is doing for every row.
Why This Algorithm Works
Multiple Linear Regression assumes:
- the target depends linearly on the features
- each feature contributes additively
- the best model is the one with minimum squared error
So it finds a plane: that stays as close as possible to the observed data points.
Understanding the 3D Plot
With two features, the model can be visualized in 3D:
- horizontal axis 1 =
- horizontal axis 2 =
- vertical axis =
The blue points are actual data. The surface is the regression plane.
Code:
ax.scatter(X[:, 0], X[:, 1], y, label="Data")
ax.plot_surface(x1_grid, x2_grid, z_grid, alpha=0.5)
Concept:
- scatter points show real observations
- plane shows model predictions
- a good model has points lying near the plane
Practical Notes
1. Multiple features
Unlike simple linear regression, multiple linear regression uses: So it can capture the effect of several variables together.
2. Coefficient meaning
If: then:
- = change in for 1 unit increase in , keeping fixed
- = change in for 1 unit increase in , keeping fixed
3. Multicollinearity
If input features are highly correlated, coefficient estimates can become unstable.
4. Linear assumption
If the true relationship is non-linear, this model may underfit.
Exam-Oriented Summary
Definition
Multiple Linear Regression predicts a continuous output using two or more input features.
Model Equation
Normal Equation
Performance Metric
Steps
- arrange input matrix
- add bias column
- compute coefficients using Normal Equation
- predict outputs
- evaluate using
Very Short Revision
- add bias column
- apply Normal Equation
- get intercept and coefficients
- form regression plane
- predict target values
- evaluate using
Main formulas:
Final Takeaway
Multiple Linear Regression extends simple linear regression to multiple input features. Instead of fitting a line, it fits a plane or hyperplane. It learns coefficients using the Normal Equation and predicts continuous values using: For your sample dataset, it should learn values close to the true generating rule: so the fitted coefficients should be close to: