Support Vector Machines (SVM) are algorithms for classification and regression tasks. However, the standard (linear) SVM can only classify data that is linearly separable, meaning the classes can be separated by a straight line (in 2D) or a hyperplane (in higher dimensions). Non-Linear SVM extends SVM to handle complex, non-linearly separable data using kernels. Kernels enable SVM to work in higher dimensions where data can become linearly separable.
Why Non-Linear SVM is Required
In many situations data cannot be separated with a straight line. For example, one group of points might surround another group in a circle. A simple Support Vector Machines (SVM) won't work well here because it only draws straight lines. Non-linear SVM is needed because it can draw curved lines to separate such data properly. This helps the model make better predictions when the data has complex shapes or patterns. SVM uses a technique called the kernel trick.
Linear vs Non-Linear SVMNon-Linear SVM uses kernels to work in higher dimensions where data can become linearly separable.
What is Kernel?
Instead of explicitly computing the transformation the kernel computes the dot product of data points in the higher-dimensional space directly that helps a model find patterns in complex data and transforming the data into a higher-dimensional space where it becomes easier to separate different classes or detect relationships.
For example, suppose we have data points shaped like two concentric circles: one circle represents one class and the other circle represents another class. If we try to separate these classes with a straight line it can't be done because the data is not linearly separable in its current form.
When we use a kernel function it transforms the original 2D data like the concentric circles into a higher-dimensional space where the data becomes linearly separable. In that higher-dimensional space the SVM finds a simple straight-line decision boundary to separate the classes.
When we bring this straight-line decision boundary back to the original 2D space it no longer looks like a straight line. Instead, it appears as a circular boundary that perfectly separates the two classes. This happens because the kernel trick allows the SVM to "see" the data in a new way enabling it to draw a boundary that fits the original shape of the data.
Popular kernel functions in SVM
- Radial Basis Function (RBF): Captures patterns in data by measuring the distance between points and is ideal for circular or spherical relationships.
- Linear Kernel: Works for data that is linearly separable problem without complex transformations.
- Polynomial Kernel: Models more complex relationships using polynomial equations.
- Sigmoid Kernel: Mimics neural network behavior using sigmoid function and is suitable for specific non-linear problems.
Below are some examples of Non-Linear SVM Classification.
Example 1: Non linear SVM in Circular Decision Boundary
Below is the Python implementation for Non linear SVM in circular decision boundary.
1. Importing Libraries
We begin by importing the necessary libraries for data generation, model training, evaluation, and visualization.
Python import numpy as np import matplotlib.pyplot as plt from sklearn.datasets import make_circles from sklearn.svm import SVC from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score
2. Creating and Splitting the Dataset
We generate a synthetic dataset of concentric circles and split it into training and testing sets.
Python X, y = make_circles(n_samples=500, factor=0.5, noise=0.05, random_state=42) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
3. Creating and Training the Non-Linear SVM Model
We create an SVM classifier using the RBF kernel to handle non-linear patterns and train it on the data.
Python svm = SVC(kernel='rbf', C=1, gamma=0.5) # RBF kernel allows learning circular boundaries svm.fit(X_train, y_train)
4. Making Predictions and Evaluating the Model
We predict the labels for the test set and compute the accuracy of the model.
Python y_pred = svm.predict(X_test) accuracy = accuracy_score(y_test, y_pred) print(f"Accuracy: {accuracy:.2f}")
5. Visualizing the Decision Boundary
We define a function to visualize the decision boundary of the trained non-linear SVM on the dataset.
Python def plot_decision_boundary(X, y, model): x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1 y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1 xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.01), np.arange(y_min, y_max, 0.01)) Z = model.predict(np.c_[xx.ravel(), yy.ravel()]) Z = Z.reshape(xx.shape) plt.contourf(xx, yy, Z, alpha=0.8, cmap=plt.cm.Paired) plt.scatter(X[:, 0], X[:, 1], c=y, edgecolor='k', cmap=plt.cm.Paired) plt.title("Non-linear SVM with RBF Kernel") plt.show() # Plot the decision boundary plot_decision_boundary(X, y, svm)
Output:
Non Linear SVM with RBF kernelNon linear SVM provided a decision boundary where the SVM successfully separates the two circular classes (inner and outer circles) using a curved boundary with help of RBF kernel.
Example 2: Non linear SVM for Radial Curve Pattern
Now we will see how different kernel works. We will be using polynomial kernel function for dataset with radial curve pattern.
1. Importing Libraries
We import essential libraries for dataset creation, SVM modeling, evaluation, and visualization.
Python import numpy as np import matplotlib.pyplot as plt from sklearn.datasets import make_moons from sklearn.svm import SVC from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score
2. Creating and Splitting the Dataset
We generate a synthetic "two moons" dataset, which is non-linearly separable, and split it into training and test sets.
Python X, y = make_moons(n_samples=500, noise=0.1, random_state=42) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
3. Creating and Training the SVM with Polynomial Kernel
We build an SVM classifier with a polynomial kernel and train it on the training data.
Python svm_poly = SVC(kernel='poly', degree=3, C=1, coef0=1) # degree and coef0 control the curve of the boundary svm_poly.fit(X_train, y_train)
4. Making Predictions and Evaluating the Model
We use the trained model to predict test labels and evaluate its accuracy.
Python y_pred = svm_poly.predict(X_test) accuracy = accuracy_score(y_test, y_pred) print(f"Accuracy: {accuracy:.2f}")
5. Visualizing the Decision Boundary
We define a function to plot the decision boundary learned by the SVM with a polynomial kernel.
Python def plot_decision_boundary(X, y, model): x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1 y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1 xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.01), np.arange(y_min, y_max, 0.01)) Z = model.predict(np.c_[xx.ravel(), yy.ravel()]) Z = Z.reshape(xx.shape) plt.contourf(xx, yy, Z, alpha=0.8, cmap=plt.cm.Paired) plt.scatter(X[:, 0], X[:, 1], c=y, edgecolor='k', cmap=plt.cm.Paired) plt.title("Non-linear SVM with Polynomial Kernel") plt.show() plot_decision_boundary(X, y, svm_poly)
Output:
Non linear SVM with Polynomial KernelPolynomial kernel creates a smooth, non-linear decision boundary that effectively separates the two curved regions.
Linear SVM vs Non-Linear SVM
Feature | Linear SVM | Non-Linear SVM |
---|
Decision Boundary | Straight line or hyperplane | Curved or complex boundaries |
---|
Data Separation | Works well when data is linearly separable | Suitable for non-linearly separable data |
---|
Kernel Usage | No kernel or uses a linear kernel | Uses non-linear kernels (e.g., RBF, polynomial) |
---|
Computational Cost | Generally faster and less complex | More computationally intensive |
---|
Example Use Case | Spam detection with simple features | Image classification or handwriting recognition |
---|
Applications
- Image Classification: They are widely used for image recognition tasks such as handwritten digit recognition like MNIST dataset, where the data classes are not linearly separable.
- Bioinformatics: Used in gene expression analysis and protein classification where the relationships between variables are complex and non-linear.
- Natural Language Processing (NLP): Used for text classification tasks like spam filtering or sentiment analysis where non-linear relationships exist between words and sentiments.
- Medical Diagnosis: Effective for classifying diseases based on patient data such as tumor classification where data have non-linear patterns.
- Fraud Detection: They can identify fraudulent activities by detecting unusual patterns in transactional data.
- Voice and Speech Recognition: Useful for separating different voice signals or identifying speech patterns where non-linear decision boundaries are needed.