Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

K-Nearest Neighbor

KNN is an instance-based or lazy learning algorithm. It does not learn an explicit equation during training. Instead, it:

  • stores the training data
  • waits until a new test point comes
  • finds the nearest stored examples
  • predicts using those neighbors

There are two common versions:

  • KNN Classification -> predict the most common class
  • KNN Regression -> predict the mean of neighbor target values

Main Idea

Each sample is a point in -dimensional space: To predict for a new point, KNN measures the distance from that point to all training points.

The most common distance used is Euclidean distance:

Then:

  • in classification, choose the majority class among the nearest neighbors
  • in regression, choose the average target among the nearest neighbors

Part A: KNN Classification

What It Does

Given a new test point:

  1. compute its distance to every training point
  2. sort distances
  3. pick the nearest
  4. return the majority class

So the prediction rule is:


Short and Clean KNN Classification Code

import numpy as np
from collections import Counter

class KNNClassifier:
    def __init__(self, k=3):
        self.k = k
        self.X_train = None
        self.y_train = None

    def fit(self, X, y):
        self.X_train = np.asarray(X, dtype=float)
        self.y_train = np.asarray(y)

    def _distance(self, a, b):
        return np.sqrt(np.sum((a - b) ** 2))

    def _predict_one(self, x):
        distances = [self._distance(x, x_train) for x_train in self.X_train]
        k_idx = np.argsort(distances)[:self.k]
        k_labels = self.y_train[k_idx]
        return Counter(k_labels).most_common(1)[0][0]

    def predict(self, X):
        X = np.asarray(X, dtype=float)
        return np.array([self._predict_one(x) for x in X])

X_train = np.array([
    [1, 1],
    [2, 1],
    [4, 3],
    [5, 4],
    [6, 5]
], dtype=float)

y_train = np.array([0, 0, 1, 1, 1])

X_test = np.array([
    [3, 2],
    [5, 5]
], dtype=float)

clf = KNNClassifier(k=3)
clf.fit(X_train, y_train)
pred = clf.predict(X_test)

print("Predictions:", pred)

Small Classification Example

Training data:

Test point:

Take:


Solving the Classification Example Step by Step

We compute distance from to every training point.

Distance to

Distance to

Distance to

Distance to

Distance to

Now sort the distances:

PointLabelDistance
01.414
11.414
02.236
12.828
14.243

Nearest 3 labels:

Majority class:

So:

That is exactly how KNN classification works.


KNN Classification Algorithm

Step 1: Store training data

Unlike linear models, KNN does not compute weights during training.

Code:

def fit(self, X, y):
    self.X_train = np.asarray(X, dtype=float)
    self.y_train = np.asarray(y)

Concept: Training in KNN simply means memorizing the dataset.


Step 2: Compute distance

Equation:

Code:

def _distance(self, a, b):
    return np.sqrt(np.sum((a - b) ** 2))

Concept: This measures how close a training sample is to the test sample.

Smaller distance means more similar.


Step 3: Rank neighbors

Code:

distances = [self._distance(x, x_train) for x_train in self.X_train]
k_idx = np.argsort(distances)[:self.k]

Concept:

  • compute all distances
  • sort them
  • keep the indices of the nearest

Step 4: Vote by majority

Code:

k_labels = self.y_train[k_idx]
return Counter(k_labels).most_common(1)[0][0]

Concept: Among the nearest neighbors, whichever class occurs most is chosen.

Equation:


Concept -> Equation -> Code Mapping for Classification

1. Represent samples as points

Concept: Each row is a point in feature space.

Equation:

Code:

self.X_train = np.asarray(X, dtype=float)

2. Measure closeness

Concept: Similarity is measured using Euclidean distance.

Equation:

Code:

def _distance(self, a, b):
    return np.sqrt(np.sum((a - b) ** 2))

3. Select nearest neighbors

Concept: Prediction depends only on nearby training samples.

Code:

k_idx = np.argsort(distances)[:self.k]

4. Use majority vote

Concept: Classification chooses the most frequent class.

Equation:

Code:

return Counter(k_labels).most_common(1)[0][0]

Why KNN Classification Works

The assumption is:

points that are close in feature space tend to have the same class

So instead of learning a global rule, KNN makes a local decision around the test point.

That is why it is called instance-based learning.


Part B: KNN Regression

What It Does

KNN regression follows the same steps as classification:

  1. compute distance to all training points
  2. choose the nearest
  3. average their target values

Prediction rule: where are the target values of the nearest neighbors.


Short and Clean KNN Regression Code

import numpy as np

class KNNRegressor:
    def __init__(self, k=3):
        self.k = k
        self.X_train = None
        self.y_train = None

    def fit(self, X, y):
        self.X_train = np.asarray(X, dtype=float)
        self.y_train = np.asarray(y, dtype=float)

    def _distance(self, a, b):
        return np.sqrt(np.sum((a - b) ** 2))

    def _predict_one(self, x):
        distances = [self._distance(x, x_train) for x_train in self.X_train]
        k_idx = np.argsort(distances)[:self.k]
        k_values = self.y_train[k_idx]
        return np.mean(k_values)

    def predict(self, X):
        X = np.asarray(X, dtype=float)
        return np.array([self._predict_one(x) for x in X])

X_train = np.array([[1], [2], [3], [6], [7]], dtype=float)
y_train = np.array([30000, 35000, 40000, 70000, 75000], dtype=float)

X_test = np.array([[4]], dtype=float)

reg = KNNRegressor(k=3)
reg.fit(X_train, y_train)
pred = reg.predict(X_test)

print("Prediction:", pred)

Small Regression Example

Suppose training data is:

YearsExperienceSalary
130000
235000
340000
670000
775000

Test point:

Take:


Solving the Regression Example Step by Step

We compute the distance from 4 to each training point.

Distance to 1

Distance to 2

Distance to 3

Distance to 6

Distance to 7

Sorted distances:

PointTargetDistance
3400001
2350002
6700002
1300003
7750003

Nearest 3 target values:

Prediction is their mean:

So the predicted salary is:


KNN Regression Algorithm

Step 1: Store training data

Code:

def fit(self, X, y):
    self.X_train = np.asarray(X, dtype=float)
    self.y_train = np.asarray(y, dtype=float)

Concept: Just store examples and target values.


Step 2: Compute all distances

Equation:

Code:

distances = [self._distance(x, x_train) for x_train in self.X_train]

Concept: Measure how close the new point is to all training samples.


Step 3: Pick nearest

Code:

k_idx = np.argsort(distances)[:self.k]

Concept: Keep the closest neighbors only.


Step 4: Average their target values

Equation:

Code:

k_values = self.y_train[k_idx]
return np.mean(k_values)

Concept: Regression prediction is the average of nearby outputs.


Concept -> Equation -> Code Mapping for Regression

1. Use the same distance idea

Concept: Near points should have similar target values.

Equation:

Code:

def _distance(self, a, b):
    return np.sqrt(np.sum((a - b) ** 2))

2. Find local neighborhood

Concept: Prediction uses local samples rather than a global fitted line.

Code:

k_idx = np.argsort(distances)[:self.k]

3. Average local outputs

Concept: Regression uses the mean of neighbor target values.

Equation:

Code:

return np.mean(k_values)

KNN Classification vs KNN Regression

AspectKNN ClassificationKNN Regression
Output typeclass labelcontinuous value
Final rulemajority votemean of neighbors
Formulamode of nearest labelsaverage of nearest targets

Unified Intuition

KNN does not learn a formula like: or:

Instead, it says:

for this new point, let me look around nearby training points and decide locally

So every test sample gets its own local prediction rule.

That is why KNN is called:

  • lazy learning
  • instance-based learning

Choosing the Value of

The value of controls smoothness.

Small

  • sensitive to noise
  • more flexible
  • may overfit

Large

  • smoother decision
  • may underfit
  • local details may be lost

A common rough idea: where is the number of training samples

But in practice, is usually chosen using validation.


Why Feature Scaling Matters

KNN depends entirely on distance. So features with larger numeric ranges dominate the distance.

Example:

  • Age may range from 20 to 80
  • Glucose may range from 0 to 200

Then Glucose can dominate Euclidean distance.

So scaling is often important:

Without scaling, nearest neighbors may be misleading.


Time Cost of KNN

KNN is cheap during training but expensive during prediction.

Training

  • just store the data

Prediction

For every test point:

  • compute distance to all training points
  • sort them
  • then predict

So prediction can be costly when the training set is large.

This is one of the main disadvantages of KNN.


Advantages of KNN

  • very simple
  • no training optimization needed
  • easy to understand
  • works for both classification and regression
  • naturally handles multi-class classification

Limitations of KNN

  • prediction is slow on large datasets
  • sensitive to feature scale
  • sensitive to irrelevant features
  • choice of matters a lot
  • can perform poorly in very high dimensions

This last issue is related to the curse of dimensionality.


Exam-Oriented Summary

Definition

KNN is an instance-based supervised learning algorithm that predicts a new sample using the nearest stored training samples.

Distance Formula

Classification Rule

Regression Rule

Steps

  1. store training data
  2. compute distance from test point to all training points
  3. sort by distance
  4. select nearest
  5. classify by majority vote or regress by mean

Very Short Revision

KNN Classification

  • find nearest
  • take majority class
  • output class label

KNN Regression

  • find nearest
  • take mean of target values
  • output continuous value

Main formula:


Final Takeaway

KNN is one of the simplest machine learning algorithms. It does not build a model during training, but predicts by comparing a new point to stored training examples. For classification, it uses the majority class of the nearest neighbors. For regression, it uses the mean target value of the nearest neighbors. Its simplicity makes it excellent for understanding local learning, but its prediction cost and sensitivity to scaling are important limitations.