Hyperparameter Tuning with R
Last Updated : 12 Sep, 2024
In R Language several techniques and packages can be used to optimize these hyperparameters, leading to better, more reliable models. in this article, we will discuss all the techniques and packages for Hyperparameter Tuning with R.
What are Hyperparameters?
Hyperparameters are the settings that control how a machine-learning model learns from data. Examples include the learning rate in neural networks, the number of trees in a random forest, or the number of neighbors in a k-nearest neighbors (k-NN) algorithm. Choosing the correct hyperparameters can make the difference between a model that generalizes well to new data and one that overfits or underfits the training data. Unlike model parameters, which are adjusted during training to fit the data, hyperparameters must be set before training begins.
Why Hyperparameter Tuning is Important?
Correctly tuning hyperparameters can improve a model's performance. Poor settings may cause:
- Underfitting: The model is too simple to capture the underlying patterns.
- Overfitting: The model is too complex and captures noise rather than useful patterns, leading to poor generalization.
- Slow Convergence: Poorly tuned models, such as those with too small a learning rate, can take excessively long to converge or may not converge at all.
Techniques for Hyperparameter Tuning
Here are the some of the main Techniques for Hyperparameter Tuning.
- Grid Search: Grid search is an exhaustive search method that systematically evaluates all possible combinations of a predefined set of hyperparameters. For example, when tuning a random forest model, we may create a grid of values for parameters such as
mtry
(the number of variables randomly sampled for each split) and ntree
(the number of trees in the forest). - Random Search: Random search selects random combinations of hyperparameters from predefined distributions. Unlike grid search, random search does not evaluate every possible combination, making it more efficient, especially when the search space is large.
- Bayesian Optimization: Bayesian optimization is an advanced technique that models the relationship between hyperparameters and model performance. It uses this model to predict which hyperparameters will lead to better results, refining its predictions as it gathers more data.
- Cross-Validation: Cross-validation is commonly used alongside the above methods to evaluate model performance. By splitting the data into multiple subsets, training the model on some subsets and testing on others, we ensure that the selected hyperparameters lead to a model that generalizes well to unseen data.
Now we implement stepwise to Hyperparameter Tuning with R Programming Langauge.
Step 1: Load the required liabries and dataset
Load the required libaries and dataset.
R library(randomForest) library(caret) # Load the mtcars dataset data <- mtcars
Step 2: Data Preparation
Convert the am
column (which represents the type of transmission) to a factor with two levels: "Automatic" and "Manual".
R # Convert 'am' (Transmission) to a factor for classification mtcars$am <- factor(mtcars$am, levels = c(0, 1), labels = c("Automatic", "Manual"))
Step 3: Feature Selection
A subset of features (mpg
, cyl
, hp
, wt
) is selected for modeling.
R # Subset of features for modeling features <- mtcars[, c("mpg", "cyl", "hp", "wt")]
Step 4: Define Hyperparameter Grid
Define a grid of hyperparameter values for mtry
, which controls the number of features randomly selected at each split in the Random Forest algorithm.
R #Define Hyperparameter Grid # Define a grid for the 'mtry' parameter in Random Forest tuneGrid <- expand.grid(mtry = c(1, 2))
Step 5: Cross-Validation Setup
The data is split into 5 parts: 4 for training, and 1 for validation, repeated 5 times.
R #Cross-Validation Setup (using stratified sampling) control <- trainControl(method = "cv", number = 5, summaryFunction = defaultSummary, savePredictions = TRUE, classProbs = FALSE, sampling = "smote") # Handle class imbalance in small datasets
Step 6: Model Training
Now train the model.
R # Model Training with Hyperparameter Tuning # Train the Random Forest model using the 'caret' package and grid search model <- train(am ~ mpg + cyl + hp + wt, data = mtcars, method = "rf", metric = "Accuracy", # Set the metric explicitly for classification trControl = control, tuneGrid = tuneGrid, allowParallel = TRUE)
Step 7:Print the result
Now print the best tuned model.
R # Print the Best Tuned Model print(model$bestTune) # Output the best 'mtry' value print(model)
Output:
mtry
2 2
Random Forest
32 samples
4 predictor
2 classes: 'Automatic', 'Manual'
No pre-processing
Resampling: Cross-Validated (5 fold)
Summary of sample sizes: 25, 26, 26, 26, 25
Addtional sampling using SMOTE
Resampling results across tuning parameters:
mtry Accuracy Kappa
1 0.7714286 0.5357971
2 0.9047619 0.8057971
Accuracy was used to select the optimal model using the largest value.
The final value used for the model was mtry = 2.
Setp 8: Visualize the Tuning Results
Now we will Visualize the Tuning Results.
R # Visualize the Tuning Results # Plot the performance of different hyperparameter values plot(model)
Output:
Hyperparameter Tuning with RThe plot shows accuracy across different values of mtry
. As mtry
increases, accuracy improves until it peaks at mtry = 2
. Visualizing performance across different hyperparameters helps identify the optimal settings.
Conclusion
Hyperparameter tuning is a crucial step in refining machine learning models to achieve better performance. By carefully selecting and adjusting hyperparameters, such as those in neural networks or random forests, the model's ability to generalize to new data improves, reducing the risk of overfitting or underfitting. Techniques like grid search, random search, and Bayesian optimization, especially when combined with cross-validation, provide powerful ways to identify the optimal settings. Implementing these methods in R can significantly enhance model reliability and accuracy.
Similar Reads
Hyperparameter tuning
Machine Learning model is defined as a mathematical model with several parameters that need to be learned from the data. By training a model with existing data we can fit the model parameters. However there is another kind of parameter known as hyperparameters which cannot be directly learned from t
8 min read
Hyperparameter tuning with Optuna in PyTorch
Hyperparameter tuning is a critical step in the machine learning pipeline, often determining the success of a model. Optuna is a powerful and flexible framework for hyperparameter optimization, designed to automate the search for optimal hyperparameters. When combined with PyTorch, a popular deep le
5 min read
Hyperparameter tuning with Ray Tune in PyTorch
Hyperparameter tuning is a crucial step in the machine learning pipeline that can significantly impact the performance of a model. Choosing the right set of hyperparameters can be the difference between an average model and a highly accurate one. Ray Tune is an industry-standard tool for distributed
8 min read
Sklearn | Model Hyper-parameters Tuning
Hyperparameter tuning is the process of finding the optimal values for the hyperparameters of a machine-learning model. Hyperparameters are parameters that control the behaviour of the model but are not learned during training. Hyperparameter tuning is an important step in developing machine learnin
12 min read
Hyperparameter Tuning in Linear Regression
Linear regression is one of the simplest and most widely used algorithms in machine learning. Despite its simplicity, it can be quite powerful, especially when combined with proper hyperparameter tuning. Hyperparameter tuning is the process of tuning a machine learning model's parameters to achieve
7 min read
SVM Hyperparameter Tuning using GridSearchCV | ML
Support Vector Machines (SVM) are used for classification tasks but their performance depends on the right choice of hyperparameters like C and gamma. Finding the optimal combination of these hyperparameters can be a issue. GridSearchCV automates the process by systematically testing various combina
3 min read
Random Forest Hyperparameter Tuning in Python
Random Forest is one of the most popular and powerful machine learning algorithms used for both classification and regression tasks. It works by building multiple decision trees and combining their outputs to improve accuracy and control overfitting. While Random Forest is already a robust model fin
6 min read
Parametric Inference with R
Parametric inference in R involves the process of drawing statistical conclusions regarding a population using a parametric statistical framework. These parametric models make the assumption that the data adheres to a specific probability distribution, such as the normal, binomial, or Poisson distri
7 min read
CatBoost Cross-Validation and Hyperparameter Tuning
CatBoost is a powerful gradient-boosting algorithm of machine learning that is very popular for its effective capability to handle categorial features of both classification and regression tasks. To maximize the potential of CatBoost, it's essential to fine-tune its hyperparameters which can be done
11 min read
Hyperparameters Optimization methods - ML
In this article, we will discuss the various hyperparameter optimization techniques and their major drawback in the field of machine learning. What are the Hyperparameters?Hyperparameters are those parameters that we set for training. Hyperparameters have major impacts on accuracy and efficiency whi
7 min read