Support Vector Machine Classifier | python

Support Vector Machine (SVM) is a supervised machine learning algorithm which can be used for both classification or regression challenges. The objective of the SVM algorithm is to find a hyperplane in an N-dimensional space(N — the number of features) that distinctly classifies the data points.

Support Vectors

Datapoints that are closest to the hyperplane is called support vectors . Separating line will be defined with the help of these data points. Hyperplane is a decision plane or space which is divided between a set of objects having different classes. Margin may be defined as the gap between two lines on the closet data points of different classes. sklearn and SVMs with polynomial kernel As shown in the above figure, maximum margin from the line to the points is the essence of Support Vector Machine algorithms. Only the points close to the decision boundary matters, the rest are not important. It is important to note that Support Vector Machine algorithms aim to classify correctly before maximizing the margin. Also, you can project your data into a higher dimensionality and split them with a hyperplane.

Support Vector Machines (Kernels)

The Support Vector Machine algorithm is implemented in practice using a kernel. All kernels must satisfy Mercer conditions. SVC , NuSVC and LinearSVC are classes capable of performing binary and multi-class classification on a dataset. It is desirable to use more complex kernels as it allows lines to separate the classes that are curved or even more complex. This in turn can lead to more accurate classifiers. Following examples shows how to plot the decision surface for four SVM classifiers with different kernels .

SVC with linear kernel

The implementation is based on libsvm . The fit time complexity is more than quadratic with the number of samples which makes it hard to scale to dataset with more than a couple of 10000 samples . Also, the multiclass support is handled according to a one-vs-one scheme.

Importing Libraries

import pandas as pd import numpy as np from sklearn import svm, datasets import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score

Importing the Dataset (Iris data)

It is a dataset that measures sepal-length , sepal-width , petal-length , and petal-width of three different types of iris flowers: Iris setosa, Iris virginica, and Iris versicolor. The following is an example for creating an SVM classifier by using kernels.

Importing dataset

The Python Pandas module allows you to read csv files (read_csv()) and return a DataFrame object. The file is meant for testing purposes only, you can download it here: iris-data.csv .
headers = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'Class'] df = pd.read_csv("iris-data.csv", names = headers)

Extracted the dependent(Y) and independent variable(X) from the dataset. Here, only consider the first 2 features of this dataset as independent variable.

  1. Sepal length
  2. Sepal width
X = df.iloc[:, [0,1]].values y = df.iloc[:,-1:].values
Next step is to plot the SVM boundaries with original data as follows:
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1 y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1 xx, yy = np.meshgrid(np.arange(x_min, x_max, h),np.arange(y_min, y_max, h))
Here, you have to provide the value of regularization parameter.
h = .02 # step size in the mesh C = 1.0 # SVM regularization parameter

SVM classifier object - linear

Create a Support Vector Classifier object by passing argument kernel as the linear kernel in SVC() function.
svc = svm.SVC(kernel='linear', C=C).fit(X, y) Z = svc.predict(np.c_[xx.ravel(), yy.ravel()]) Z = Z.reshape(xx.shape) plt.contourf(xx, yy, Z, cmap=plt.cm.coolwarm, alpha=0.8) plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.coolwarm) plt.xlabel('Sepal length') plt.ylabel('Sepal width') plt.xlim(xx.min(), xx.max()) plt.ylim(yy.min(), yy.max()) plt.xticks(()) plt.yticks(()) plt.title("SVC with linear kernel") plt.show()


Implementing Support Vector Machine and Kernel SVM with Python's Scikit-Learn Full Source - Python
import pandas as pd import numpy as np from sklearn import svm, datasets import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score headers = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'Class'] df = pd.read_csv("iris-data.csv", names = headers) X = df.iloc[:, [0,1]].values y = df.iloc[:,-1:].values h = .02 # step size in the mesh C = 1.0 # SVM regularization parameter # create a mesh to plot in x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1 y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1 xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h)) svc = svm.SVC(kernel='linear', C=C).fit(X, y) Z = svc.predict(np.c_[xx.ravel(), yy.ravel()]) # Put the result into a color plot Z = Z.reshape(xx.shape) plt.contourf(xx, yy, Z, cmap=plt.cm.coolwarm, alpha=0.8) # Plot also the training points plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.coolwarm) plt.xlabel('Sepal length') plt.ylabel('Sepal width') plt.xlim(xx.min(), xx.max()) plt.ylim(yy.min(), yy.max()) plt.xticks(()) plt.yticks(()) plt.title("SVC with linear kernel") plt.show()

LinearSVC (linear kernel)

Similar to SVC with parameter kernel='linear' , but implemented in terms of liblinear rather than libsvm, so it has more flexibility in the choice of penalties and loss functions and should scale better to large numbers of samples.
Support Vector Machine Python Example Full Source | Python
import pandas as pd import numpy as np from sklearn import svm, datasets import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score headers = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'Class'] df = pd.read_csv("iris-data.csv", names = headers) X = df.iloc[:, [0,1]].values y = df.iloc[:,-1:].values h = .02 # step size in the mesh C = 1.0 # SVM regularization parameter # create a mesh to plot in x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1 y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1 xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h)) lin_svc = svm.LinearSVC(C=C).fit(X, y) Z = lin_svc.predict(np.c_[xx.ravel(), yy.ravel()]) # Put the result into a color plot Z = Z.reshape(xx.shape) plt.contourf(xx, yy, Z, cmap=plt.cm.coolwarm, alpha=0.8) # Plot also the training points plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.coolwarm) plt.xlabel('Sepal length') plt.ylabel('Sepal width') plt.xlim(xx.min(), xx.max()) plt.ylim(yy.min(), yy.max()) plt.xticks(()) plt.yticks(()) plt.title("LinearSVC (linear kernel") plt.show()

SVC with RBF kernel

When the data set is linearly inseparable or in other words, the data set is non-linear, it is recommended to use kernel functions such as RBF .
Support Vector Machine Algorithm in Python Machine Learning Full Source | Python
import pandas as pd import numpy as np from sklearn import svm, datasets import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score headers = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'Class'] df = pd.read_csv("iris-data.csv", names = headers) X = df.iloc[:, [0,1]].values y = df.iloc[:,-1:].values h = .02 # step size in the mesh C = 1.0 # SVM regularization parameter # create a mesh to plot in x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1 y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1 xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h)) rbf_svc = svm.SVC(kernel='rbf', gamma=0.7, C=C).fit(X, y) Z = rbf_svc.predict(np.c_[xx.ravel(), yy.ravel()]) # Put the result into a color plot Z = Z.reshape(xx.shape) plt.contourf(xx, yy, Z, cmap=plt.cm.coolwarm, alpha=0.8) # Plot also the training points plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.coolwarm) plt.xlabel('Sepal length') plt.ylabel('Sepal width') plt.xlim(xx.min(), xx.max()) plt.ylim(yy.min(), yy.max()) plt.xticks(()) plt.yticks(()) plt.title("SVC with RBF kernel") plt.show()

SVC with polynomial (degree 3) kernel


Classifying data using Support Vector Machines Full Source | Python
import pandas as pd import numpy as np from sklearn import svm, datasets import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score headers = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'Class'] df = pd.read_csv("iris-data.csv", names = headers) X = df.iloc[:, [0,1]].values y = df.iloc[:,-1:].values h = .02 # step size in the mesh C = 1.0 # SVM regularization parameter # create a mesh to plot in x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1 y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1 xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h)) poly_svc = svm.SVC(kernel='poly', degree=3, C=C).fit(X, y) Z = poly_svc.predict(np.c_[xx.ravel(), yy.ravel()]) # Put the result into a color plot Z = Z.reshape(xx.shape) plt.contourf(xx, yy, Z, cmap=plt.cm.coolwarm, alpha=0.8) # Plot also the training points plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.coolwarm) plt.xlabel('Sepal length') plt.ylabel('Sepal width') plt.xlim(xx.min(), xx.max()) plt.ylim(yy.min(), yy.max()) plt.xticks(()) plt.yticks(()) plt.title("SVC with polynomial (degree 3) kernel") plt.show()