K-Nearest Neighbor(KNN) | Python
K Nearest Neighbor (KNN) algorithm falls under the Supervised Learning category and is used for classification and regression. However, it is more widely used in classification problems. In real life scenarios, K Nearest Neighbor is widely used as it is non-parametric which means it does not make any underlying assumptions about the distributions of data.How K-Nearest Neighbor works?
K Nearest Neighbor algorithm works on the basis of feature similarity . The classification of a given data point is determined by how closely out-of-sample features resemble your training set. In classification, the output can be calculated as the class with the highest frequency from the K-most similar instances . Each instance in essence votes for their class and the class with the most votes is taken as the prediction.Example

Python implementation of the KNN algorithm
Importing Libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
Importing the Dataset (Iris data)
It is a dataset that measures sepal-length, sepal-width, petal-length, and petal-width of three different types of iris flowers: Iris setosa, Iris virginica, and Iris versicolor. The task is to predict the "Class" to which these plants belong. To import the dataset and load it into our pandas dataframe , execute the following code:
IrisPath = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
Assign colum names to the dataset
headers = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'Class']
Read dataset to pandas dataframe .
ds = pd.read_csv(IrisPath, names = headers)
Sample dataset

Data Pre-Processing
X = ds.iloc[:, :-1].values
y = ds.iloc[:, 4].values
The X variable contains the first four columns of the dataset while y contains the labels.
Train Test Split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.40)
The above code splits the dataset into 60% train data and 40% test data .
Scale the Features
Before making any actual predictions , it is always a good practice to scale the features so that all of them can be uniformly evaluated.
scaler = StandardScaler()
scaler.fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)
Fitting K-NN classifier to the Training data
Next step is to fit the K-NN classifier to the training data.
classifier = KNeighborsClassifier(n_neighbors = 8)
classifier.fit(X_train, y_train)
n_neighbors: To define the required neighbors of the algorithm. Here it takes 8.
Predicting the Test Result
The final step is to make predictions on the test data to y_pred vector.
y_pred = classifier.predict(X_test)
Confusion Matrix and Classification Report
Create the Confusion Matrix and Classification Report for your K-NN model to see the accuracy of the classifier.
cfMatrix = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:")
print(cfMatrix)
cReport = classification_report(y_test, y_pred)
print("Classification Report:",)
print (cReport)
accuracy = accuracy_score(y_test,y_pred)
print("Accuracy:",accuracy)
Confusion Matrix:
[[22 0 0]
[ 0 20 1]
[ 0 1 16]]
Classification Report:
precision recall f1-score support
Iris-setosa 1.00 1.00 1.00 22
Iris-versicolor 0.95 0.95 0.95 21
Iris-virginica 0.94 0.94 0.94 17
accuracy 0.97 60
macro avg 0.96 0.96 0.96 60
weighted avg 0.97 0.97 0.97 60
Accuracy: 0.9666666666666667
The Above results show that your KNN algorithm was able to classify all the 60 records in the test set with 96% accuracy. Although the algorithm performed very well with this dataset , don't expect the same results with all applications.
Full Source | Python
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
IrisPath = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
headers = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'Class']
ds = pd.read_csv(IrisPath, names = headers)
X = ds.iloc[:, :-1].values
y = ds.iloc[:, 4].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.40)
scaler = StandardScaler()
scaler.fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)
classifier = KNeighborsClassifier(n_neighbors = 8)
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)
cfMatrix = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:")
print(cfMatrix)
cReport = classification_report(y_test, y_pred)
print("Classification Report:",)
print (cReport)
accuracy = accuracy_score(y_test,y_pred)
print("Accuracy:",accuracy)
Related Topics
- Simple Linear Regression | Python Data Science
- Multiple Linear Regression | Python Data Science
- Ordinary Least Squares Regression | Python Data Science
- Polynomial Regression | Python
- Logistic Regression | Python Machine Learning
- Decision Tree in Machine Learning | Python
- Random Forest | Python Machine Learning
- Support Vector Machine | Python Machine Learning