Data Prediction using Decision Tree of rpart

Last Updated : 23 Jul, 2024

Decision trees are a popular choice due to their simplicity and interpretation, and effectiveness at handling both numerical and categorical data. The rpart (Recursive Partitioning) package in R specializes in constructing these trees, offering a robust framework for building predictive models.

Overview of rpart

rpart stands for Recursive Partitioning and is a versatile tool in R for creating decision trees. It builds models based on a set of binary rules, splitting the data recursively to maximize the homogeneity of the resulting subgroups. This process is useful for both regression and classification tasks, making it highly versatile.

Setting Up rpart

To set up a decision tree using rpart, you need:

A properly formatted dataset: Ensure no missing values or factor variables with unused levels.
A formula specifying the model: This formula determines which variable is predicted and which variables are used as predictors.

Parameters such as method, minsplit, cp, and maxdepth, which control the complexity and performance of the tree. Now we will discuss Step-by-Step Breakdown of the Data Prediction in R Programming Language.

Step 1: Load the Necessary Library

Loads the rpart package, which is required to build decision tree models. If rpart is not installed, you would need to install it using install.packages("rpart").

library(rpart)

Step 2: Load the Dataset

Loads the built-in Iris dataset. This dataset includes four features (Sepal.Length, Sepal.Width, Petal.Length, Petal.Width) and a target variable (Species).

data(iris) head(iris)

Output:

  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

Step 3: Build the Decision Tree Model

Now we will Build the Decision Tree Model.

model <- rpart(Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width,                                                          data=iris, method="class")

Formula: Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width. This formula specifies that the Species is the dependent variable, and the four measurements are the independent variables.
Data: Specifies the dataset (iris) to use for the model.
Method: "class" indicates that the task is a classification. It instructs rpart to treat the Species variable as a categorical outcome.

Step 4: Plot the Decision Tree

Now we will plot the Decision Tree.

plot(model) text(model, use.n=TRUE)

Output:

plot(model): Draws the basic structure of the tree.
text(model, use.n=TRUE): Annotates the tree with node numbers, allowing you to see how many data points end up in each leaf of the tree.

Step 5: Create New Data for Prediction

Now we will Create New Data for Prediction.

new_data <- data.frame(Sepal.Length=5.5, Sepal.Width=3.5, Petal.Length=1.4,                                                             Petal.Width=0.2)

Values: The measurements provided are hypothetical and are used to demonstrate how the model performs predictions.

Step 6: Make Predictions

prediction <- predict(model, new_data, type="class") print(prediction)

Output:

     1 
setosa 
Levels: setosa versicolor virginica

model: The decision tree model built in step 3.
new_data: The new data point defined in step 5.
type="class": Specifies that the prediction should return the class (species) rather than probabilities.

Conclusion

The rpart package in R offers a user-friendly yet powerful approach to building decision trees, making it a valuable tool for both novice and experienced data scientists. By following the steps outlined in this article and adhering to best practices, one can effectively utilize decision trees to make reliable predictions and gain insights from various types of data. Whether you are performing a straightforward classification task or tackling more complex predictive modeling challenges, rpart provides the necessary tools to achieve accurate and interpretable results.

Data Prediction using Decision Tree of rpart

poojashu00qn

Improve

Article Tags :

Practice Tags :

Machine Learning

Data Prediction using Decision Tree of rpart

Overview of rpart

Setting Up rpart

Step 1: Load the Necessary Library

Step 2: Load the Dataset

Step 3: Build the Decision Tree Model

Step 4: Plot the Decision Tree

Step 5: Create New Data for Prediction

Step 6: Make Predictions

Conclusion

Similar Reads