Gram Schmidt Process for ML
Last Updated : 29 Apr, 2025
Gram-Schmidt Process is used to convert a set of vectors into an orthonormal basis. It converts a set of linearly independent vectors into a set of orthogonal vectors, which are also normalized to one unit of length.
This process is important in most fields of machine learning because it assists in enhancing numerical stability, reducing complexity in calculations and making the computation more efficient.
Orthogonality and Normalization
There are two basic concepts that must be understood before moving on to the Gram-Schmidt process: orthogonality and normalization.
- Orthogonality: Two vectors are orthogonal if the dot product equals zero. This implies that they are 90 degrees to one another. In machine learning, it is convenient to work with orthogonal vectors since they make matrix operations and computation more stable.
- Normalization: A vector becomes normalized if its magnitude (length) equals one. It is achieved by dividing all elements of the vector by its magnitude. Normalization ensures data does not get affected by differences in scale and stabilizes learning algorithms.
The two come together in the Gram-Schmidt process to give a group of orthonormal vectors—vectors that are both orthogonal and normalized.
The Gram-Schmidt Process Step by Step
The Gram-Schmidt process accepts a set of linearly independent vectors and converts them into an orthonormal set.
Assume we have a set of vectors:
\mathbf{v}_1, \mathbf{v}_2, \mathbf{v}_3, ..., \mathbf{v}_n
We need to convert them into a new orthonormal set of vectors:
\mathbf{u}_1, \mathbf{u}_2, \mathbf{u}_3, ..., \mathbf{u}_n
The process goes in the following manner:
Step 1: Select the First Vector
The first orthogonal vector is just the first vector of the original set:
\mathbf{u}_1 = \mathbf{v}_1
To make it normalized, divide it by its length:
\mathbf{e}_1 = \frac{\mathbf{u}_1}{\|\mathbf{u}_1\|}
This provides us the first orthonormal vector.
Step 2: Orthogonalize the Second Vector
To find the second orthogonal vector, remove the component of v2 that is in the direction of \mathbf{e}_1:
\mathbf{u}_2 = \mathbf{v}_2 - \text{proj}_{\mathbf{e}_1} (\mathbf{v}_2)
Here, the projection is given by:
\text{proj}_{\mathbf{e}_1} (\mathbf{v}_2) = \frac{\mathbf{v}_2 \cdot \mathbf{e}_1}{\mathbf{e}_1 \cdot \mathbf{e}_1} \mathbf{e}_1
After obtaining u_2, nomalize it:
\mathbf{e}_2 = \frac{\mathbf{u}_2}{\|\mathbf{u}_2\|}
Step 3: Make the Third Vector Orthogonal
For the third vector, remove the components in the directions of both \mathbf{e}_1 and \mathbf{e}_2:
\mathbf{u}_3 = \mathbf{v}_3 - \text{proj}_{\mathbf{e}_1} (\mathbf{v}_3) - \text{proj}_{\mathbf{e}_2} (\mathbf{v}_3)
After obtaining \mathbf{u}_3, normalize it:
\mathbf{e}_3 = \frac{\mathbf{u}_3}{\|\mathbf{u}_3\|}
Step 4: Repeat for All Vectors
This process is repeated for all vectors in the original set. The general formula for any vector \mathbf{u}_k is:
\mathbf{u}_k = \mathbf{v}_k - \sum_{i=1}^{k-1} \text{proj}_{\mathbf{e}_i} (\mathbf{v}_k)
After obtaining \mathbf{u}_k, normalize it:
\mathbf{e}_k = \frac{\mathbf{u}_k}{\|\mathbf{u}_k\|}
This ensures that all vectors are orthonormal.
Importance of Gram-Schmidt Process in Machine Learning
Machine learning algorithms tend to handle large datasets as matrices. Among the major reasons the Gram-Schmidt process is necessary are:
- Numerical Stability: When handling big datasets, errors can accumulate due to floating-point computations. The Gram-Schmidt process minimizes the errors by converting the vectors to orthonormal.
- Dimensionality Reduction: Principal Component Analysis (PCA), one of the most widely used feature reduction methods in a dataset, is based on orthogonal transformations like the Gram-Schmidt process.
- Effective Matrix Decomposition: Most machine learning algorithms consist of decomposing matrices into simpler forms. QR decomposition, a significant step towards solving linear regression and least squares problems, utilizes the Gram-Schmidt process.
- Feature Selection: In certain situations, the Gram-Schmidt process can be used to choose the most relevant features by determining redundant information.
Applications of the Gram-Schmidt Process in Machine Learning
- QR Decomposition in Regression Models: Linear regression problems frequently require solving sets of equations. QR decomposition, which employs the Gram-Schmidt method, decomposes a matrix into an orthogonal matrix Q and an upper triangular matrix R. It facilitates efficiently solving problems involving least squares.
- Principal Component Analysis (PCA): PCA is a technique for reducing the number of dimensions in a dataset while preserving important information. The Gram-Schmidt process helps in finding orthogonal principal components, which are used to transform data into a lower-dimensional space.
- Eigenvector Computations: Eigenvectors are utilized for feature extraction and clustering by most machine learning models. The Gram-Schmidt process is useful for making eigenvectors orthonormal, improving model accuracy.
- Data Preprocessing: In datasets with correlated or redundant features, the Gram-Schmidt process can be utilized to convert data into an orthonormal basis, stabilizing machine learning models and saving computation time.
Limitations of the Gram-Schmidt Process
Despite its advantages, the Gram-Schmidt process has some limitations:
- Numerical Instability: When used with large datasets, small numerical small numerical errors can accumulate. To enhance stability, a modified form known as the Modified Gram-Schmidt Process is sometimes used to improve stability.
- Computational Cost: For large-dimensional data, the procedure can be computationally costly. More efficient methods, like Singular Value Decomposition (SVD), may be preferred in some cases.
Similar Reads
ML | Understanding Data Processing In machine learning, data is the most important aspect, but the raw data is messy, incomplete, or unstructured. So, we process the raw data to transform it into a clean, structured format for analysis, and this step in the data science pipeline is known as data processing. Without data processing, e
5 min read
Image Processing Algorithms in Computer Vision In the field of computer vision, image preprocessing is a crucial step that involves transforming raw image data into a format that can be effectively utilized by machine learning algorithms. Proper preprocessing can significantly enhance the accuracy and efficiency of image recognition tasks. This
10 min read
Least Mean Squares Filter in Signal Processing Filtering is a fundamental process in signal processing used to enhance or extract useful information from a signal by reducing noise, isolating certain frequency components, or performing other transformations. Filters are employed in various applications such as audio processing, image enhancement
6 min read
Natural Language Processing (NLP) Tutorial Natural Language Processing (NLP) is a branch of Artificial Intelligence (AI) that helps machines to understand and process human languages either in text or audio form. It is used across a variety of applications from speech recognition to language translation and text summarization.Natural Languag
5 min read
Gaussian Process Regression (GPR) on Mauna Loa CO2 data In article explores the application of Gaussian Process Regression (GPR) on the Mauna Loa CO2 dataset using Scikit-Learn. What is Gaussian Process Regression (GPR)? Gaussian Process Regression is the process of making predictions about an unknown function and dealing with regression problems with th
11 min read