Linear and Quadratic Discriminant Analysis using Sklearn
Last Updated : 20 May, 2024
Linear Discriminant Analysis (LDA) and Quadratic Discriminant Analysis (QDA) are two well-known classification methods that are used in machine learning to find patterns and put things into groups. They are especially helpful when you have labeled data and want to classify new observations notes into pre-defined categories.
In this we will implement both these techniques, Linear and Quadratic Discriminant Analysis using Sklearn.
Understanding Linear and Quadratic Discriminant Analysis
Linear Discriminant Analysis (LDA)
Linear Discriminant Analysis assumes that the data in each class is normally distributed and has the same correlation matrix. It finds a linear combination of features that best separates the classes apart, sometimes referred to as Fisher's linear discriminant. The idea is to maximize the distance between classes while projecting the data into a lower-dimensional space.
Under the presumptions, LDA determines the best linear decision boundary by minimizing the ratio of variation within a class to variance across classes.
The steps to compute LDA using sklearn are:
- Compute the mean vectors for each class.
- Compute the within-class and between-class scatter matrices.
- Compute the eigenvalues and eigenvectors for the scatter matrices.
- Select the top k eigenvectors that match to the k biggest eigenvalues to make a new feature space.
- Project the data onto the new feature space.
Quadratic Discriminant Analysis (QDA)
QDA is similar to LDA but does not assume that the correlation matrices of each class are equal. This helps QDA to build more flexible decision limits by describing each class with its own correlation matrix.
The steps to compute QDA using sklearn are:
- Compute the mean vector and correlation matrix for each class.
- Use the quadratic form of the discriminant function to describe new data.
Implementing Linear and Quadratic Discriminant Analysis with Scikit-Learn
Scikit-Learn is a well-known Python machine learning package that offers effective implementations of Linear Discriminant Analysis (LDA) and Quadratic Discriminant Analysis (QDA) via their respective classes. To use LDA or QDA in Scikit-Learn, Let's go through with below steps
1. Import the Necessary Modules
Python import numpy as np import matplotlib.pyplot as plt from sklearn.datasets import make_classification from sklearn.model_selection import train_test_split from sklearn.discriminant_analysis import LinearDiscriminantAnalysis, QuadraticDiscriminantAnalysis from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
2. Generate Data
Python # Generate synthetic data X, y = make_classification(n_samples=1000, n_features=2, n_informative=2, n_redundant=0, n_clusters_per_class=1, n_classes=3, random_state=42) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
Applying Linear Discriminant Analysis (LDA)
Python # Initialize and train the LDA model lda = LinearDiscriminantAnalysis() lda.fit(X_train, y_train) y_pred_lda = lda.predict(X_test) print("LDA Accuracy:", accuracy_score(y_test, y_pred_lda)) print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred_lda)) print("Classification Report:\n", classification_report(y_test, y_pred_lda))
Output:
LDA Accuracy: 0.8266666666666667
Confusion Matrix (LDA):
[[ 75 4 22]
[ 16 71 0]
[ 0 10 102]]
Classification Report (LDA):
precision recall f1-score support
0 0.82 0.74 0.78 101
1 0.84 0.82 0.83 87
2 0.82 0.91 0.86 112
accuracy 0.83 300
macro avg 0.83 0.82 0.82 300
weighted avg 0.83 0.83 0.83 300
Applying Quadratic Discriminant Analysis (QDA)
Python # Initialize and train the QDA model qda = QuadraticDiscriminantAnalysis() qda.fit(X_train, y_train) # Make predictions y_pred_qda = qda.predict(X_test) # Evaluate the model print("QDA Accuracy:", accuracy_score(y_test, y_pred_qda)) print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred_qda)) print("Classification Report:\n", classification_report(y_test, y_pred_qda))
Output:
QDA Accuracy: 0.93
Confusion Matrix (QDA):
[[ 96 2 3]
[ 10 77 0]
[ 4 2 106]]
Classification Report (QDA):
precision recall f1-score support
0 0.87 0.95 0.91 101
1 0.95 0.89 0.92 87
2 0.97 0.95 0.96 112
accuracy 0.93 300
macro avg 0.93 0.93 0.93 300
weighted avg 0.93 0.93 0.93 300
Visualizing Linear and Quadratic Discriminant Analysis
For visualization let's plot decision boundaries , the decision border is a line that divides the two classes of data points. The goal of a classifier is to predict the class of a new data point, based on its features. The decision border shows the classifier's rule for splitting the classes.
Python def plot_decision_boundaries(X, y, model, title, subplot_index): plt.subplot(subplot_index) x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1 y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1 xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.01), np.arange(y_min, y_max, 0.01)) Z = model.predict(np.c_[xx.ravel(), yy.ravel()]) Z = Z.reshape(xx.shape) plt.contourf(xx, yy, Z, alpha=0.8) plt.scatter(X[:, 0], X[:, 1], c=y, edgecolors='k', marker='o') plt.title(title) plt.xlabel('Feature 1') plt.ylabel('Feature 2') plt.figure(figsize=(10, 4)) # Plot decision boundaries for LDA plot_decision_boundaries(X_test, y_test, lda, "LDA Decision Boundary", 121) # Plot decision boundaries for QDA plot_decision_boundaries(X_test, y_test, qda, "QDA Decision Boundary", 122) plt.tight_layout() plt.show()
Output:
Decision Boundary Plots for LDA and QDAThe number of dots in the picture does not appear to be linked with the leftovers. Residue, in this case, refers to the difference between the expected value of a data point and its real value.
LDA projects data from a higher-dimensional space onto a lower-dimensional space in a way that maximizes the separation between different classes. In this case, the decision boundary likely separates the data points into two or more classes while QDA allows for a more complex connection. The QDA decision boundary looks to be more flexible than the LDA decision boundary, which may help it to better fit the data in some cases.
Conclusion
Finally, for supervised classification problems, Linear Discriminant Analysis (LDA) and Quadratic Discriminant Analysis (QDA) are effective methods. QDA allows each class to have its own covariance matrix, while LDA relaxes this condition by assuming that the classes have equal covariance matrices. Both approaches are practical and have their merits; Scikit-Learn offers handy implementations that make integrating them into machine learning pipelines simple.
Similar Reads
Quadratic Discriminant Analysis
Linear Discriminant Analysis Now, Let's consider a classification problem represented by a Bayes Probability distribution P(Y=k | X=x), LDA does it differently by trying to model the distribution of X given the predictors class (I.e. the value of Y) P(X=x| Y=k): [Tex]P(Y=k | X=x) = \frac{P(X=x | Y=k
4 min read
Discriminant Function Analysis Using R
Discriminant Function Analysis (DFA) is a statistical technique to classify data into specific groups on the basis of independent variables. It has various applications in finance, biology, and marketing. Key ConceptsDependent Variable: Categorical variable to be predicted (e.g., species).Independen
2 min read
Linear Discriminant Analysis in Machine Learning
When working with high-dimensional datasets it is important to apply dimensionality reduction techniques to make data exploration and modeling more efficient. One such technique is Linear Discriminant Analysis (LDA) which helps in reducing the dimensionality of data while retaining the most signific
6 min read
Linear Discriminant Analysis in R Programming
One of the most popular or well established Machine Learning technique is Linear Discriminant Analysis (LDA ). It is mainly used to solve classification problems rather than supervised classification problems. It is basically a dimensionality reduction technique. Using the Linear combinations of pre
6 min read
Canonical Correlation Analysis (CCA) using Sklearn
Canonical Correlation Analysis (CCA) is a statistical method used in data analysis to identify and quantify the relationships between two sets of variables. When working with multivariate dataâthat is, when there are several variables in each of the two sets and we want to know how they connectâit i
10 min read
Gaussian Discriminant Analysis
Gaussian Discriminant Analysis (GDA) is a supervised learning algorithm used for classification tasks in machine learning. It is a variant of the Linear Discriminant Analysis (LDA) algorithm that relaxes the assumption that the covariance matrices of the different classes are equal. GDA works by ass
7 min read
Regularized Discriminant Analysis
Regularized Discriminant analysis Linear Discriminant analysis and QDA work straightforwardly for cases where a number of observations is far greater than the number of predictors n>p. In these situations, it offers very advantages such as ease to apply (Since we don't have to calculate the covar
3 min read
Normal and Shrinkage Linear Discriminant Analysis for Classification in Scikit Learn
In this article, we will try to understand the difference between Normal and Shrinkage Linear Discriminant Analysis for Classification. We will try to implement the same using sci-kit learn library in Python. But first, let's try to understand what is LDA. What is Linear discriminant analysis (LDA)?
4 min read
Gaussian Naive Bayes using Sklearn
In the world of machine learning, Gaussian Naive Bayes is a simple yet powerful algorithm used for classification tasks. It belongs to the Naive Bayes algorithm family, which uses Bayes' Theorem as its foundation. The goal of this post is to explain the Gaussian Naive Bayes classifier and offer a de
8 min read
Classification Metrics using Sklearn
Machine learning classification is a powerful tool that helps us make predictions and decisions based on data. Whether it's determining whether an email is spam or not, diagnosing diseases from medical images, or predicting customer churn, classification algorithms are at the heart of many real-worl
14 min read