# K-Nearest Neighbor(KNN) | Python

K Nearest Neighbor (KNN) algorithm is indeed a versatile supervised learning technique used for both **classification and regression** tasks, although it is more commonly applied to classification problems. KNN is popular in real-life scenarios due to its non-parametric nature, meaning it does not assume any specific distribution of data. This flexibility makes KNN well-suited for various applications where the underlying data distribution may not be well-defined or known.

## How K-Nearest Neighbor works?

The **K Nearest Neighbor (KNN)** algorithm operates based on the principle of feature similarity. When classifying a new data point, KNN considers how closely its features resemble the training set instances. In classification tasks, the output class is determined by the majority class among the K-most similar instances from the training set. Each neighboring instance effectively casts a "vote" for its class, and the class with the highest number of votes becomes the prediction for the new data point. This voting mechanism makes KNN a simple yet effective algorithm for solving classification problems based on the concept of feature similarity.

### Example

In the given dataset, the **K Nearest Neighbor (KNN) algorithm** is applied to predict the class of the red data point. It starts by calculating the distances between the red data point and all the other data points in the dataset. In this case, K=3 is specified, so the algorithm finds the 3 nearest yellow data points with the least distance to the red data point.

Since all the nearest neighbors are yellow data points (belonging to 'Category-A'), the KNN algorithm predicts that the red data point also belongs to 'Category-A'. This is because the majority of the K-nearest neighbors are from 'Category-A', so the red data point is classified into that category. The **KNN algorithm** makes predictions based on the class of the most similar data points in the feature space, making it a simple yet effective classification method.

## Python implementation of the KNN algorithm

### Importing Libraries

### Importing the Dataset (Iris data)

It is a dataset that measures sepal-length, sepal-width, petal-length, and petal-width of three different types of iris flowers: Iris setosa, Iris virginica, and Iris versicolor. The task is to predict the **"Class"** to which these plants belong. To import the dataset and load it into our **pandas dataframe** , execute the following code:

### Assign colum names to the dataset

Read dataset to **pandas dataframe** .

### Sample dataset

### Data Pre-Processing

The X variable contains the first four columns of the **dataset** while y contains the labels.

### Train Test Split

The above code splits the dataset into 60% **train data** and 40% **test data** .

### Scale the Features

Before making any actual **predictions** , it is always a good practice to scale the features so that all of them can be uniformly evaluated.

### Fitting K-NN classifier to the Training data

Next step is to fit the **K-NN classifier** to the training data.

**n_neighbors:**To define the required neighbors of the algorithm. Here it takes 8.

### Predicting the Test Result

The final step is to make **predictions** on the test data to y_pred vector.

## Confusion Matrix and Classification Report

Create the **Confusion Matrix and Classification Report** for your K-NN model to see the accuracy of the classifier.

The Above results show that your **KNN algorithm** was able to classify all the 60 records in the test set with 96% accuracy. Although the algorithm performed very well with this **dataset** , don't expect the same results with all applications.

### Full Source | Python

### Conclusion

K-Nearest Neighbor (KNN) is a versatile supervised learning algorithm used for both classification and regression tasks. It operates based on the principle of feature similarity, making predictions by finding the K-most similar data points from the training set and assigning the class based on majority voting (for classification) or averaging (for regression). KNN is a **non-parametric method** that does not assume underlying data distributions, making it widely used and applicable to various real-life scenarios.

**Related Topics**