Recall that we had derived the following equation for non-parametric density estimation:
where:
-
$k$ is the number of data points in region$R$ -
$n$ is the total number of data points in the dataset$D$ -
$V$ is the volume of the region$R$
In K-Nearest Neighbour (KNN) method, we fix the volume
Given a set of data points
The posterior probability of class
Now we can use bayes classification rule to assign label to the new data point
- Choose a value for
$k$ - For each test data point
$x_i$ , find the$k$ nearest neighbours in the training data$D$ - Assign the class label to
$x_i$ based on the majority class label of the$k$ nearest neighbours